* What is an efficient way to get all blobs / trees that have notes attached?
@ 2016-04-01 10:51 Sebastian Schuberth
2016-04-01 12:16 ` Johan Herland
0 siblings, 1 reply; 5+ messages in thread
From: Sebastian Schuberth @ 2016-04-01 10:51 UTC (permalink / raw)
To: git
Hi,
I'm curious whether there's a more efficient way to get a list of blobs
/ trees (and their names) that have notes attached than doing this:
1) Get all notes refs I'm interested in (git-for-each-ref).
2) For each notes ref, get the list of notes (git-notes list) and store
them in a hash table that maps object hashes to notes.
3) Recursively list all blobs / trees (git-ls-tree) and look whether an
object's hash is conatined in our table to get its notes.
In particular 3) could be expensive for repos with a lot of files as
we're looking at all of them just to see whether they have notes attached.
Regards,
Sebastian
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: What is an efficient way to get all blobs / trees that have notes attached?
2016-04-01 10:51 What is an efficient way to get all blobs / trees that have notes attached? Sebastian Schuberth
@ 2016-04-01 12:16 ` Johan Herland
2016-04-01 12:23 ` Johan Herland
2016-04-04 7:46 ` Sebastian Schuberth
0 siblings, 2 replies; 5+ messages in thread
From: Johan Herland @ 2016-04-01 12:16 UTC (permalink / raw)
To: Sebastian Schuberth; +Cc: Git mailing list
On Fri, Apr 1, 2016 at 12:51 PM, Sebastian Schuberth
<sschuberth@gmail.com> wrote:
> Hi,
>
> I'm curious whether there's a more efficient way to get a list of blobs /
> trees (and their names) that have notes attached than doing this:
>
> 1) Get all notes refs I'm interested in (git-for-each-ref).
>
> 2) For each notes ref, get the list of notes (git-notes list) and store them
> in a hash table that maps object hashes to notes.
>
> 3) Recursively list all blobs / trees (git-ls-tree) and look whether an
> object's hash is conatined in our table to get its notes.
>
> In particular 3) could be expensive for repos with a lot of files as we're
> looking at all of them just to see whether they have notes attached.
In (3), why would you need to search through _all_ blobs/trees? Would
it not be cheaper to simply query the object type of each annotated
object from (2)? I.e. something like:
for notes_ref in $(git for-each-ref refs/notes | cut -c 49-)
do
echo "--- $notes_ref ---"
for annotated_obj in $(git notes --ref=$notes_ref list | cut -c 41-)
do
type=$(git cat-file -t "$annotated_obj")
if test "$type" != "commit"
then
echo "$annotated_obj: $type"
fi
done
done
Can probably be made even faster by using the --batch option to cat-file...
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: What is an efficient way to get all blobs / trees that have notes attached?
2016-04-01 12:16 ` Johan Herland
@ 2016-04-01 12:23 ` Johan Herland
2016-04-04 7:46 ` Sebastian Schuberth
1 sibling, 0 replies; 5+ messages in thread
From: Johan Herland @ 2016-04-01 12:23 UTC (permalink / raw)
To: Sebastian Schuberth; +Cc: Git mailing list
On Fri, Apr 1, 2016 at 2:16 PM, Johan Herland <johan@herland.net> wrote:
> for notes_ref in $(git for-each-ref refs/notes | cut -c 49-)
> do
> echo "--- $notes_ref ---"
> for annotated_obj in $(git notes --ref=$notes_ref list | cut -c 41-)
> do
> type=$(git cat-file -t "$annotated_obj")
> if test "$type" != "commit"
> then
> echo "$annotated_obj: $type"
> fi
> done
> done
>
> Can probably be made even faster by using the --batch option to cat-file...
For example:
for notes_ref in $(git for-each-ref refs/notes | cut -c 49-)
do
echo "--- $notes_ref ---"
git notes --ref=$notes_ref list | cut -c 42- | git cat-file
--batch-check="%(objecttype) %(objectname)" | grep
'^\(\(blob\)\|\(tree\)\) '
done
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: What is an efficient way to get all blobs / trees that have notes attached?
2016-04-01 12:16 ` Johan Herland
2016-04-01 12:23 ` Johan Herland
@ 2016-04-04 7:46 ` Sebastian Schuberth
2016-04-04 17:33 ` Johan Herland
1 sibling, 1 reply; 5+ messages in thread
From: Sebastian Schuberth @ 2016-04-04 7:46 UTC (permalink / raw)
To: Johan Herland; +Cc: Git mailing list
On Fri, Apr 1, 2016 at 2:16 PM, Johan Herland <johan@herland.net> wrote:
>> 3) Recursively list all blobs / trees (git-ls-tree) and look whether an
>> object's hash is conatined in our table to get its notes.
>>
>> In particular 3) could be expensive for repos with a lot of files as we're
>> looking at all of them just to see whether they have notes attached.
>
> In (3), why would you need to search through _all_ blobs/trees? Would
> it not be cheaper to simply query the object type of each annotated
> object from (2)? I.e. something like:
>
> for notes_ref in $(git for-each-ref refs/notes | cut -c 49-)
> do
> echo "--- $notes_ref ---"
> for annotated_obj in $(git notes --ref=$notes_ref list | cut -c 41-)
> do
> type=$(git cat-file -t "$annotated_obj")
> if test "$type" != "commit"
> then
> echo "$annotated_obj: $type"
> fi
> done
> done
Thanks for the idea. The problem is that I do want to list the notes
by path of the object they belong to. As a blob could potentially
belong to more than one path (copies of files in the repo), I do not
see another way of getting that information other than iterating over
all blobs and checking what path(s) they belong to.
--
Sebastian Schuberth
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: What is an efficient way to get all blobs / trees that have notes attached?
2016-04-04 7:46 ` Sebastian Schuberth
@ 2016-04-04 17:33 ` Johan Herland
0 siblings, 0 replies; 5+ messages in thread
From: Johan Herland @ 2016-04-04 17:33 UTC (permalink / raw)
To: Sebastian Schuberth; +Cc: Git mailing list
On Mon, Apr 4, 2016 at 9:46 AM, Sebastian Schuberth
<sschuberth@gmail.com> wrote:
> On Fri, Apr 1, 2016 at 2:16 PM, Johan Herland <johan@herland.net> wrote:
>>> 3) Recursively list all blobs / trees (git-ls-tree) and look whether an
>>> object's hash is conatined in our table to get its notes.
>>>
>>> In particular 3) could be expensive for repos with a lot of files as we're
>>> looking at all of them just to see whether they have notes attached.
>>
>> In (3), why would you need to search through _all_ blobs/trees? Would
>> it not be cheaper to simply query the object type of each annotated
>> object from (2)? I.e. something like:
>>
>> for notes_ref in $(git for-each-ref refs/notes | cut -c 49-)
>> do
>> echo "--- $notes_ref ---"
>> for annotated_obj in $(git notes --ref=$notes_ref list | cut -c 41-)
>> do
>> type=$(git cat-file -t "$annotated_obj")
>> if test "$type" != "commit"
>> then
>> echo "$annotated_obj: $type"
>> fi
>> done
>> done
>
> Thanks for the idea. The problem is that I do want to list the notes
> by path of the object they belong to. As a blob could potentially
> belong to more than one path (copies of files in the repo), I do not
> see another way of getting that information other than iterating over
> all blobs and checking what path(s) they belong to.
True; fundamentally what you want is a blob/tree ID -> path(s) mapping,
which is an independent problem, unrelated to to the initial notes lookup.
I don't know of a solution faster than the brute-force search you already
sketched. If this lookup is important to your use case, you could consider
building/caching the required mapping when the notes are added in the
first place, but I don't know if that is possible in your scenario...
...Johan
> --
> Sebastian Schuberth
--
Johan Herland, <johan@herland.net>
www.herland.net
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-04-04 17:33 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-01 10:51 What is an efficient way to get all blobs / trees that have notes attached? Sebastian Schuberth
2016-04-01 12:16 ` Johan Herland
2016-04-01 12:23 ` Johan Herland
2016-04-04 7:46 ` Sebastian Schuberth
2016-04-04 17:33 ` Johan Herland
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).