From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6168C77B71 for ; Fri, 14 Apr 2023 15:58:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230192AbjDNP6o (ORCPT ); Fri, 14 Apr 2023 11:58:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43524 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230219AbjDNP6n (ORCPT ); Fri, 14 Apr 2023 11:58:43 -0400 Received: from mail-pl1-x632.google.com (mail-pl1-x632.google.com [IPv6:2607:f8b0:4864:20::632]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0CA4583E3 for ; Fri, 14 Apr 2023 08:58:41 -0700 (PDT) Received: by mail-pl1-x632.google.com with SMTP id lh8so5736916plb.1 for ; Fri, 14 Apr 2023 08:58:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1681487920; x=1684079920; h=mime-version:user-agent:message-id:in-reply-to:date:references :subject:cc:to:from:sender:from:to:cc:subject:date:message-id :reply-to; bh=Nw0u4ZS0gHExBhZhwhPiFT1YDTAe1bBwciIk0C5Mi10=; b=gUkLE2WA5X/29nvkjypl2+mDnQaEoG2qwvD/lFNj/BnZOwUi1ylMJNjvEzF0Nu7VSN hfzj93F6WzElp+welh0uTMu4Uir0jFPxpCCWeWGCRUm4dP1JL+lx3Vh8TaEbHihVvfCt aP8qxYP/OJzHqxBAoemPENjJCiuJOiw7jUqm6T2bw2qe/GTGsGpG8DJy47vsMcvUDLWm OPHp3ZNSCY0YPny2ZMZSxLmmNX88XlWQpXal4Jt1I1pJjbsc99CYTTP4EB/eAMtUEiIJ hIlNV+JkKUmVhxigdLHltEVoPNXxDB613F+vCCTk33n1MDbcMqAuPincf/XMiRK3SY8b XXXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681487920; x=1684079920; h=mime-version:user-agent:message-id:in-reply-to:date:references :subject:cc:to:from:sender:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=Nw0u4ZS0gHExBhZhwhPiFT1YDTAe1bBwciIk0C5Mi10=; b=jnzy4mDapAfE1ZvOj7kTJuFa+PpuWx51ju+iVHUEdZ2s+SOcFI7EnmYVlo+nlu0NcL 7YD4SNAnMBICQTK4qJLAOGpMUJRujt6MM3ssyzwvARijeu1+iaAXOGfaFvY7SSd2xAmP TixVkZxzIs9+H9573xAQoqxkslpiPDhMRhdNLTJnTk05xyS6AQH4DrYEIJEtbjHw6ykq TDF6VrFyUrkV4VYTV2uOXasHx7iMw1UhsMYHnX5sVETKIqSanpwfX4c5ufP2zloJD8NM 5r85f7ViIaPSOCFTlDiDTe8tGafYTp4AB9vXNgCFIr4VmvFDBpTr3D6veIp8c9Pp+6Sd 50SQ== X-Gm-Message-State: AAQBX9ceHAAV4+zMIAmRrvrGEFpcOZoHqXmbROwb1wy/t2RLOL7cdnsD 3x5jKZAIWUsFWAcQVjwWehc= X-Google-Smtp-Source: AKy350Zi5FXiMRjdPath0Y7OvsN9Umw56g0n9zjnXRllTFwNRLO5lfGHrTq2c6oGz4ozVXAv6nT3EA== X-Received: by 2002:a17:902:d4d0:b0:19f:3234:fec5 with SMTP id o16-20020a170902d4d000b0019f3234fec5mr4072241plg.51.1681487920186; Fri, 14 Apr 2023 08:58:40 -0700 (PDT) Received: from localhost (170.102.105.34.bc.googleusercontent.com. [34.105.102.170]) by smtp.gmail.com with ESMTPSA id v12-20020a170902b7cc00b001a65fa33e62sm3230415plz.154.2023.04.14.08.58.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Apr 2023 08:58:39 -0700 (PDT) Sender: Junio C Hamano From: Junio C Hamano To: ZheNing Hu Cc: Jeff King , Taylor Blau , Git List , johncai86@gmail.com, Linus Torvalds Subject: Re: [Question] Can git cat-file have a type filtering option? References: <20230410201414.GC104097@coredump.intra.peff.net> <20230412074309.GB1695531@coredump.intra.peff.net> <20230414073035.GB540206@coredump.intra.peff.net> Date: Fri, 14 Apr 2023 08:58:39 -0700 In-Reply-To: (ZheNing Hu's message of "Fri, 14 Apr 2023 20:17:34 +0800") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org ZheNing Hu writes: > Oh, you are right, this could be to prevent conflicts between Git objects > with identical content but different types. However, I always associate > Git with the file system, where metadata such as file type and size is > stored in the inode, while the file data is stored in separate chunks. I am afraid the presentation order Peff used caused a bit of confusion. The true reason is what Peff brought up as "Or worse". We need to be able to tell, given only the name of an object, everything that we need to know about the object, and for that, we need the type information when we ask for an object by its name. Having size embedded in the data that comes back to us when we consult object database with an object name helps the implementation to pre-allocate a buffer and then inflate into it--there is no fundamental reason why it should be there. It is a secondary problem created by the design choice that we store type together with contents, that the object type recorded in a tree entry may contradict the actual type of the object recorded in the tree entry. We could have declared that the object type found in a tree entry is to be trusted, if we didn't record the type in the object database together with the object contents. I think your original question was not "why do we store type and size together with the contents?", but was "why do we include in the hash computation?", and all of the above discuss related tangent without touching the original question. The need to have type or size available when we ask the object database for data associated with the object does not necessarily mean they must be hashed together with the contents. It was done merely because "why not? that way, we do not have to worry about catching corrupt values for type and size information we want to store together with the contents". IOW, we could have checksummed these two pieces of information separately, but why bother?