All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Abhishek Kumar <abhishekkumar8222@gmail.com>
Cc: Christian Couder <christian.couder@gmail.com>,
	git@vger.kernel.org, Derrick Stolee <stolee@gmail.com>
Subject: Re: [RFC] Possible idea for GSoC 2020
Date: Thu, 19 Mar 2020 13:50:00 +0100	[thread overview]
Message-ID: <86eeto5x1j.fsf@gmail.com> (raw)
In-Reply-To: <CAHk66fv-fRZdUFC978sKUT_b7VYi6S2f2vdTQ6iB-wcCznnBHg@mail.gmail.com> (Abhishek Kumar's message of "Tue, 17 Mar 2020 23:30:00 +0530")

Abhishek Kumar <abhishekkumar8222@gmail.com> writes:

> Greetings Jakub
>
>> So perhaps we should expand "Commit graph labeling for speeding up git
>> commands" idea, splitting it into two possible ways of working in this
>> project: the more technical 'Generation number v2', and 'Interval labels
>> for commit graph' which is more of research project?  Which should be
>> put first, then?
>
> I would suggest working on generation number v2 first because:
> - We ship improved performance *sooner*.
> - I find it easier to shift from simpler to more complex systems.

It would be also more of an engineering project, than research one.

> On a personal note, I would be able to do a better job at working on generation
> number v2 than on interval labels. While reading through the papers mentioned
> was fun and full of a-ha moments, I find building things more
> fulfilling. I would be glad if either you or Derrick opt for mentoring it.

All right.  I have added 'Generation number v2' as possible way of
working on the "Commit graph labeling for speeding up git commands"
idea.

https://git.github.io/SoC-2020-Ideas/#commit-graph-labeling-for-speeding-up-git-commands


> You could read my proposal at the link below. It is very rough and I haven't
> proofread it yet. I will send out a more formal proposal once the direction
> of this project is decided.
>
> https://github.com/abhishekkumar2718/GSoC20/blob/master/graph_labelling.md
>
> **Too long, didn't read**

I will comment only on the TLDR version below.

> - Commit graphs are small to medium-sized (compared to problem sizes observed in
> graph-theory literature) sparse graphs. They have unusual properties compared
> to more conventional graphs that can be exploited for better performance.

Right.

> - Most of git's reachability queries are negative and using a negative-cut
> filter improves performance more than a positive-cut filter.

Not exactly.  Most time consuming git graph queries are those that
return a subgraph or otherwise need to walk the commit graph (like
topological sorting, or ahead/behind numbers).  Here in most cases
negative-cut filters, that reduce the number of commits walked, improve
performance more (with rare exceptions, like --ancestry-path query).

That is what I think, but it is not something I have proven...

> - Implementing and maintaining a two-dimensional reachability index is hard
> and does not offer justifiable performance gains.

I think it would be better to not mention graph reachability algorithms
at all, and just talk about generation number v2 as straight performance
improvement over generation number v1.

> - We plan to use corrected commit date as the generation number v2 because it is
> locally computable, immutable and can be incrementally updated.

Unfortunately because of an oopsie with respect to commit-graph file
format versioning, at least for the time being generation number v2
needs to be also backward compatibile with generation number v1.  See
for example:
  https://git.github.io/rev_news/2019/06/28/edition-52/#reviews
  https://speakerdeck.com/jnareb/graph-operations-in-git-and-how-to-make-them-faster?slide=74

>
> - If git ever considers a two-dimensional reachability index, either post order
> DFS, GRAIL or an index based on commit date would be good starting
> places to explore.
>
> I go in more details about GRAIL, FERRARI and PReaCH, explaining
> briefly how they work, their advantages, and disadvantages.

Again, I don't think that is necessary.

Best,
-- 
Jakub Narębski

  reply	other threads:[~2020-03-19 12:50 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-17 18:00 [RFC] Possible idea for GSoC 2020 Abhishek Kumar
2020-03-19 12:50 ` Jakub Narebski [this message]
     [not found] <CAHk66ftQqFqP-4kd4-8cHtCMEofSUvbeSQ24pcCCrkz7+2JG1w@mail.gmail.com>
2020-03-27 18:31 ` Jakub Narębski
  -- strict thread matches above, loose matches on Subject: below --
2020-03-17 17:00 Abhishek Kumar
2020-03-17 18:05 ` Jakub Narebski
2020-03-13 17:30 Abhishek Kumar
2020-03-10 14:50 Jakub Narebski
2020-03-11 19:03 ` Junio C Hamano
2020-03-13 10:56   ` Jakub Narebski
2020-03-15 14:26     ` Jakub Narebski
2020-03-17 12:24       ` Kaartic Sivaraam
2020-03-17 12:39         ` Kaartic Sivaraam
2020-03-17 14:22         ` Jakub Narebski
2020-03-11 20:29 ` Christian Couder
2020-03-11 21:30   ` Jakub Narebski
2020-03-11 21:48     ` Christian Couder
2020-03-12 12:17       ` Jakub Narebski
2020-03-12 13:08         ` Jakub Narebski
2020-03-13 10:59           ` Jakub Narebski
2020-03-13 13:08 ` Philip Oakley
2020-03-13 14:34   ` Jakub Narebski
2020-03-15 18:57     ` Philip Oakley
2020-03-15 21:14       ` Jakub Narebski
2020-03-16 14:47         ` Philip Oakley
2020-03-16 12:44 ` Derrick Stolee
2020-03-17  3:13   ` Jakub Narebski
2020-03-17  7:24     ` Christian Couder
2020-03-17 11:49       ` Derrick Stolee
2020-03-17 14:18       ` Jakub Narebski
2020-03-17 17:04         ` Christian Couder
2020-03-18 13:55           ` Jakub Narebski
2020-03-18 15:25             ` Derrick Stolee
2020-03-19 12:52               ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86eeto5x1j.fsf@gmail.com \
    --to=jnareb@gmail.com \
    --cc=abhishekkumar8222@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.