git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* What is an efficient way to get all blobs / trees that have notes attached?
@ 2016-04-01 10:51 Sebastian Schuberth
  2016-04-01 12:16 ` Johan Herland
  0 siblings, 1 reply; 5+ messages in thread
From: Sebastian Schuberth @ 2016-04-01 10:51 UTC (permalink / raw)
  To: git

Hi,

I'm curious whether there's a more efficient way to get a list of blobs 
/ trees (and their names) that have notes attached than doing this:

1) Get all notes refs I'm interested in (git-for-each-ref).

2) For each notes ref, get the list of notes (git-notes list) and store 
them in a hash table that maps object hashes to notes.

3) Recursively list all blobs / trees (git-ls-tree) and look whether an 
object's hash is conatined in our table to get its notes.

In particular 3) could be expensive for repos with a lot of files as 
we're looking at all of them just to see whether they have notes attached.

Regards,
Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: What is an efficient way to get all blobs / trees that have notes attached?
  2016-04-01 10:51 What is an efficient way to get all blobs / trees that have notes attached? Sebastian Schuberth
@ 2016-04-01 12:16 ` Johan Herland
  2016-04-01 12:23   ` Johan Herland
  2016-04-04  7:46   ` Sebastian Schuberth
  0 siblings, 2 replies; 5+ messages in thread
From: Johan Herland @ 2016-04-01 12:16 UTC (permalink / raw)
  To: Sebastian Schuberth; +Cc: Git mailing list

On Fri, Apr 1, 2016 at 12:51 PM, Sebastian Schuberth
<sschuberth@gmail.com> wrote:
> Hi,
>
> I'm curious whether there's a more efficient way to get a list of blobs /
> trees (and their names) that have notes attached than doing this:
>
> 1) Get all notes refs I'm interested in (git-for-each-ref).
>
> 2) For each notes ref, get the list of notes (git-notes list) and store them
> in a hash table that maps object hashes to notes.
>
> 3) Recursively list all blobs / trees (git-ls-tree) and look whether an
> object's hash is conatined in our table to get its notes.
>
> In particular 3) could be expensive for repos with a lot of files as we're
> looking at all of them just to see whether they have notes attached.

In (3), why would you need to search through _all_ blobs/trees? Would
it not be cheaper to simply query the object type of each annotated
object from (2)? I.e. something like:

for notes_ref in $(git for-each-ref refs/notes | cut -c 49-)
do
    echo "--- $notes_ref ---"
    for annotated_obj in $(git notes --ref=$notes_ref list | cut -c 41-)
    do
        type=$(git cat-file -t "$annotated_obj")
        if test "$type" != "commit"
        then
            echo "$annotated_obj: $type"
        fi
    done
done

Can probably be made even faster by using the --batch option to cat-file...


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: What is an efficient way to get all blobs / trees that have notes attached?
  2016-04-01 12:16 ` Johan Herland
@ 2016-04-01 12:23   ` Johan Herland
  2016-04-04  7:46   ` Sebastian Schuberth
  1 sibling, 0 replies; 5+ messages in thread
From: Johan Herland @ 2016-04-01 12:23 UTC (permalink / raw)
  To: Sebastian Schuberth; +Cc: Git mailing list

On Fri, Apr 1, 2016 at 2:16 PM, Johan Herland <johan@herland.net> wrote:
> for notes_ref in $(git for-each-ref refs/notes | cut -c 49-)
> do
>     echo "--- $notes_ref ---"
>     for annotated_obj in $(git notes --ref=$notes_ref list | cut -c 41-)
>     do
>         type=$(git cat-file -t "$annotated_obj")
>         if test "$type" != "commit"
>         then
>             echo "$annotated_obj: $type"
>         fi
>     done
> done
>
> Can probably be made even faster by using the --batch option to cat-file...

For example:

for notes_ref in $(git for-each-ref refs/notes | cut -c 49-)
do
    echo "--- $notes_ref ---"
    git notes --ref=$notes_ref list | cut -c 42- | git cat-file
--batch-check="%(objecttype) %(objectname)" | grep
'^\(\(blob\)\|\(tree\)\) '
done


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: What is an efficient way to get all blobs / trees that have notes attached?
  2016-04-01 12:16 ` Johan Herland
  2016-04-01 12:23   ` Johan Herland
@ 2016-04-04  7:46   ` Sebastian Schuberth
  2016-04-04 17:33     ` Johan Herland
  1 sibling, 1 reply; 5+ messages in thread
From: Sebastian Schuberth @ 2016-04-04  7:46 UTC (permalink / raw)
  To: Johan Herland; +Cc: Git mailing list

On Fri, Apr 1, 2016 at 2:16 PM, Johan Herland <johan@herland.net> wrote:

>> 3) Recursively list all blobs / trees (git-ls-tree) and look whether an
>> object's hash is conatined in our table to get its notes.
>>
>> In particular 3) could be expensive for repos with a lot of files as we're
>> looking at all of them just to see whether they have notes attached.
>
> In (3), why would you need to search through _all_ blobs/trees? Would
> it not be cheaper to simply query the object type of each annotated
> object from (2)? I.e. something like:
>
> for notes_ref in $(git for-each-ref refs/notes | cut -c 49-)
> do
>     echo "--- $notes_ref ---"
>     for annotated_obj in $(git notes --ref=$notes_ref list | cut -c 41-)
>     do
>         type=$(git cat-file -t "$annotated_obj")
>         if test "$type" != "commit"
>         then
>             echo "$annotated_obj: $type"
>         fi
>     done
> done

Thanks for the idea. The problem is that I do want to list the notes
by path of the object they belong to. As a blob could potentially
belong to more than one path (copies of files in the repo), I do not
see another way of getting that information other than iterating over
all blobs and checking what path(s) they belong to.

-- 
Sebastian Schuberth

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: What is an efficient way to get all blobs / trees that have notes attached?
  2016-04-04  7:46   ` Sebastian Schuberth
@ 2016-04-04 17:33     ` Johan Herland
  0 siblings, 0 replies; 5+ messages in thread
From: Johan Herland @ 2016-04-04 17:33 UTC (permalink / raw)
  To: Sebastian Schuberth; +Cc: Git mailing list

On Mon, Apr 4, 2016 at 9:46 AM, Sebastian Schuberth
<sschuberth@gmail.com> wrote:
> On Fri, Apr 1, 2016 at 2:16 PM, Johan Herland <johan@herland.net> wrote:
>>> 3) Recursively list all blobs / trees (git-ls-tree) and look whether an
>>> object's hash is conatined in our table to get its notes.
>>>
>>> In particular 3) could be expensive for repos with a lot of files as we're
>>> looking at all of them just to see whether they have notes attached.
>>
>> In (3), why would you need to search through _all_ blobs/trees? Would
>> it not be cheaper to simply query the object type of each annotated
>> object from (2)? I.e. something like:
>>
>> for notes_ref in $(git for-each-ref refs/notes | cut -c 49-)
>> do
>>     echo "--- $notes_ref ---"
>>     for annotated_obj in $(git notes --ref=$notes_ref list | cut -c 41-)
>>     do
>>         type=$(git cat-file -t "$annotated_obj")
>>         if test "$type" != "commit"
>>         then
>>             echo "$annotated_obj: $type"
>>         fi
>>     done
>> done
>
> Thanks for the idea. The problem is that I do want to list the notes
> by path of the object they belong to. As a blob could potentially
> belong to more than one path (copies of files in the repo), I do not
> see another way of getting that information other than iterating over
> all blobs and checking what path(s) they belong to.

True; fundamentally what you want is a blob/tree ID -> path(s) mapping,
which is an independent problem, unrelated to to the initial notes lookup.

I don't know of a solution faster than the brute-force search you already
sketched. If this lookup is important to your use case, you could consider
building/caching the required mapping when the notes are added in the
first place, but I don't know if that is possible in your scenario...


...Johan

> --
> Sebastian Schuberth

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-04-04 17:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-01 10:51 What is an efficient way to get all blobs / trees that have notes attached? Sebastian Schuberth
2016-04-01 12:16 ` Johan Herland
2016-04-01 12:23   ` Johan Herland
2016-04-04  7:46   ` Sebastian Schuberth
2016-04-04 17:33     ` Johan Herland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).