* [GSOC][QUESTION] How to parse the properties of the object at once
@ 2021-08-07 6:32 ZheNing Hu
2021-08-07 7:15 ` Junio C Hamano
0 siblings, 1 reply; 2+ messages in thread
From: ZheNing Hu @ 2021-08-07 6:32 UTC (permalink / raw)
To: Junio C Hamano, Git List, Ævar Arnfjörð Bjarmason,
Jeff King, Christian Couder
Hi guys,
parse_object_buffer() which will call parse_tag_buffer() and
parse_commit_buffer()
to parse object data and store in `struct tag` and `struct commit`, which can
directly obtain the parsed data later by something like grab_tag_values() and
grab_commit_values().
But parse_object_buffer() will only parse part of the object data, so
that we need
some additional parsing like grab_person() and grab_sub_body_contents() in
ref-filter. It is a repetitive parsing and will affect performance.
So I am thinking if we can add some members in `struct commit` or `struct tag`,
so that we can get more different types of data in the parsing process.
At the same time, these parsing are optional, which means that we can set
several hook pointers to decide whether we need this type data, like
oid_object_info_extended() does, in this way we will not bring a lot of
performance loss when we don't need them.
But I find in commit.h, there is such a comment:
/*
* The size of this struct matters in full repo walk operations like
* 'git clone' or 'git gc'. Consider using commit-slab to attach data
* to a commit instead of adding new fields here.
*/
This means that I shouldn't touch the content of struct commit. So I see the
code of `commit-slab`, it seems that it is doing additional parsing.
But what I hope
is that let parse_commit_buffer() can parse commit data only once.
In addition, I am thinking about whether to build a huge "struct object_view"
to store the parsed objects' properties states and results.
Any good ideas?
--
ZheNing Hu
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [GSOC][QUESTION] How to parse the properties of the object at once
2021-08-07 6:32 [GSOC][QUESTION] How to parse the properties of the object at once ZheNing Hu
@ 2021-08-07 7:15 ` Junio C Hamano
0 siblings, 0 replies; 2+ messages in thread
From: Junio C Hamano @ 2021-08-07 7:15 UTC (permalink / raw)
To: ZheNing Hu
Cc: Git List, Ævar Arnfjörð Bjarmason, Jeff King,
Christian Couder
ZheNing Hu <adlternative@gmail.com> writes:
> This means that I shouldn't touch the content of struct commit. So I see the
> code of `commit-slab`, it seems that it is doing additional parsing.
We should keep what is in "struct commit" and parsing overhead to
the minimum, as it matters to performance (especially when auxiliary
data structures like commit-graph are not available for the part of
history). If some pieces of data (like "from this byte to the end
is %(body)") do not matter in commit traversal, they are optional,
and (1) we should not always parse them out, instead we should do so
only on demand, and (2) we should not add members for them in the
commit object, but use commit slabs to store them.
As to the slab, it is not like you have to have a slab per these
optional fields you may want to parse. If for example you need the
authorship ident and timestamp, even if you do not need committer
ident and timestamp, it is plausible to have a type of slab that
holds these four data items together (and only fill parts of them
that are actually requested by the callers). Also, things that are
strings may want to be stored as a relative offset into the commit
buffer, instead of duplicated copies of string values.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2021-08-07 7:15 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-08-07 6:32 [GSOC][QUESTION] How to parse the properties of the object at once ZheNing Hu
2021-08-07 7:15 ` Junio C Hamano
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.