From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F833CA0FF3 for ; Fri, 1 Sep 2023 19:18:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346414AbjIATQo (ORCPT ); Fri, 1 Sep 2023 15:16:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51292 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229750AbjIATQn (ORCPT ); Fri, 1 Sep 2023 15:16:43 -0400 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 73BF410DF for ; Fri, 1 Sep 2023 12:16:40 -0700 (PDT) Received: (qmail 30790 invoked by uid 109); 1 Sep 2023 19:16:40 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 01 Sep 2023 19:16:40 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 22336 invoked by uid 111); 1 Sep 2023 19:16:40 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 01 Sep 2023 15:16:40 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 1 Sep 2023 15:16:39 -0400 From: Jeff King To: Kousik Sanagavarapu Cc: Junio C Hamano , git@vger.kernel.org Subject: Re: [PATCH] ref-filter: sort numerically when ":size" is used Message-ID: <20230901191639.GA1955435@coredump.intra.peff.net> References: <20230901142624.12063-1-five231003@gmail.com> <20230901174540.GB1947546@coredump.intra.peff.net> <20230901183206.GA1952051@coredump.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Sat, Sep 02, 2023 at 12:29:07AM +0530, Kousik Sanagavarapu wrote: > > > > I think they are covered implicitly by the "else" block of the > > > > conditional that checks for FIELD_STR. > > > > > > Ah, OK. That needs to be future-proofed to force future developers > > > who want to add different FIELD_FOO type to look at the comparison > > > logic. If we want to do so, it should be done as a separate topic > > > for cleaning-up the mess, not as part of this effort. > > What I also find weird is the fact that we assign a "cmp_type" to the > whole atom. Like "contents" is FIELD_STR and "objectsize" is "FIELD_ULONG" > in "valid_atom". This seems wrong because the options of the atoms should be > the ones deciding the "cmp_type", no? I think the data structure is a little confusing if you haven't worked with it before. But basically each "atom" corresponds to a single "%()" block in the format. So if you ran: git for-each-ref --format="%(contents:size) %(contents:body)" you'd have two atoms in the used_atom struct: one for the size and one for the body. IMHO the code would be a lot easier to work with if the atoms were structured as a parse tree with child pointers (especially when you get into things like "if" that have sub-expressions). I think one of the reasons that used_atom is an array is to de-duplicate repeated mentions (so if you formatted "%(foo) %(foo)" it would only have to store the computed value once). But I think that is the wrong way to optimize it. We shouldn't be storing any strings per-atom, but rather walking the parse tree to produce a single output buffer. And the values should be cheap to fill in, because we should parse the object as necessary up front. This is more or less the way the pretty.c parser does it. But that is all quite a large tangent from what you're working on, and would probably be a ground-up rewrite of the formatting code. You can safely ignore my rant for the purposes of your patch. ;) > I wanted to leave the "cmp_type" field of the atom untouched because that > would mess up this "global" setting of "contents" to be a "FIELD_STR" (or > even "raw" for that matter). Although that seems like a bad idea, after > I've read Junio's and your comments. Yeah, I agree that would be a problem if there were one global "contents". But we are allocating a new atom struct on the fly for the contents:size directive that we parse, and so on. -Peff