All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: Dhruva Krishnamurthy <dhruvakm@gmail.com>,
	 John Cai <johncai86@gmail.com>,
	 Karthik Nayak <karthik.188@gmail.com>,
	git@vger.kernel.org
Subject: Re: using tree as attribute source is slow, was Re: Help troubleshoot performance regression cloning with depth: git 2.44 vs git 2.42
Date: Wed, 01 May 2024 15:40:45 -0700	[thread overview]
Message-ID: <xmqqikzxi2aa.fsf@gitster.g> (raw)
In-Reply-To: <20240501220030.GA1442509@coredump.intra.peff.net> (Jeff King's message of "Wed, 1 May 2024 18:00:30 -0400")

Jeff King <peff@peff.net> writes:

>   - the cache here is static-local in the function. It should probably
>     at least be predicated on the tree_oid, and maybe attached to the
>     repository object? I think having one per repository at a time would
>     be fine (generally the tree_oid is set once per process, so it's not
>     like you're switching between multiple options).

It should be per tree_oid or you will get a stale and incorrect
result when you read paths from a different tree.  But thanks for
that "something simple and stupid" code to clearly demonstrate
that repeated reading of the attributes data is the problem.

Given a tree with a name, the result of reading a path from that
tree does not depend on the repository the tree appears in, so the
cache does not have a reason to be tied to a particular repository.
Generally we work only inside a single repository, so attaching the
cache to that single repository would be a good way to make it
available globally without adding another global variable, as
the_repository can serve as the starting point for the global state,
but other than that there is no reason.

I agree that the attribute layer may be a better place to cache this
data.  As you pointed out, it already has a caching behaviour in its
attr_stack data structure that is optimized for local walk that
visits every path in a tree in depth first order, but it is likely
that a different caching scheme that is more suitable for random
access may need to be introduced.  The cache eviction strategy may
need some thought (the attr_stack based caching has an obviously
optimal eviction strategy---to evict the attribute data read from a
directory when the traversal leaves that directory) in order to
avoid unbounded bloat of the cached data.

> I've cc'd John as the author of 2386535511. But really, that was just
> enabling by default the attr-tree code added by 47cfc9bd7d (attr: add
> flag `--source` to work with tree-ish, 2023-01-14). Although in that
> original context (git check-attr) the lack of caching would be much less
> important.
>
> -Peff

  parent reply	other threads:[~2024-05-01 22:40 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-01  5:26 Help troubleshoot performance regression cloning with depth: git 2.44 vs git 2.42 Dhruva Krishnamurthy
2024-05-01 22:00 ` using tree as attribute source is slow, was " Jeff King
2024-05-01 22:37   ` rsbecker
2024-05-01 22:40   ` Junio C Hamano [this message]
2024-05-02  0:33   ` Taylor Blau
2024-05-02 17:33     ` Taylor Blau
2024-05-02 17:44       ` Junio C Hamano
2024-05-02 17:55         ` Taylor Blau
2024-05-02 19:01           ` Karthik Nayak
2024-05-02 21:08             ` Junio C Hamano
2024-05-03  5:37               ` Dhruva Krishnamurthy
2024-05-03 15:34                 ` Re* " Junio C Hamano
2024-05-03 17:46                   ` Jeff King
2024-05-06 20:28                     ` Taylor Blau
2024-05-13 20:16                   ` John Cai
2024-06-05 21:43                   ` [PATCH] attr.tree: HEAD:.gitattributes is no longer the default in a bare repo Junio C Hamano
2024-06-06  8:32                     ` Jeff King
2024-06-06 16:02                       ` Junio C Hamano
2024-05-02 18:34         ` using tree as attribute source is slow, was Re: Help troubleshoot performance regression cloning with depth: git 2.44 vs git 2.42 Dhruva Krishnamurthy
2024-05-02  0:45   ` Dhruva Krishnamurthy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqikzxi2aa.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=dhruvakm@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=johncai86@gmail.com \
    --cc=karthik.188@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.