All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org,
	Christian Couder <christian.couder@gmail.com>,
	Derrick Stolee <stolee@gmail.com>,
	Heba Waly <heba.waly@gmail.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Emily Shaffer <emilyshaffer@google.com>
Subject: Re: [RFC] Possible idea for GSoC 2020
Date: Fri, 13 Mar 2020 11:56:51 +0100	[thread overview]
Message-ID: <86tv2s34lo.fsf@gmail.com> (raw)
In-Reply-To: <xmqqo8t2hfxj.fsf@gitster.c.googlers.com> (Junio C. Hamano's message of "Wed, 11 Mar 2020 12:03:52 -0700")

Cc: Stolee, Heba, Jonathan T., Emily Shaffer. 

Junio C Hamano <gitster@pobox.com> writes:
> Jakub Narebski <jnareb@gmail.com> writes:
>
>> A few questions:
>> - is it too late to propose a new project idea for GSoC 2020?
>> - is it too difficult of a project for GSoC?
>> ...
>> ### Graph labelling for speeding up git commands
>>
>>  - Language: C
>>  - Difficulty: hard / difficult
>>  - Possible mentors: Jakub Narębski
>
> I am not running the GSoC or participating in it in any way other
> than just being a reviewer-maintainer of the project, but I would
> appreciate a well-thought-out write-up very much.

I have prepared slides for "Graph operations in Git version control
system" (PDF), mainly describing what was already done to improve their
performance, but they also include a few thoughts about the future (like
additional graph reachability labelings)... unfortunately the slides are
in Polish, not in English.

If there is interest, I could translate them, and put the result
somewhere accessible.

Or I could try to make this information into blog post -- this topic
would really gain from using images (like Derrick Stolee series of
articles on commit-graph).

>> Git uses various clever methods for making operations on very large
>> repositories faster, from bitmap indices for git-fetch[1], to generation
>> numbers (also known as topological levels) in the commit-graph file for
>> commit graph traversal operations like `git log --graph`[2].
>>
>> One possible improvement that can make Git even faster is using min-post
>> intervals labelling.  The basis of this labelling is post-visit order of
>> a depth-first search traversal tree of a commit graph, let's call it
>> 'post(v)'.
>>
>> If for each commit 'v' we would compute and store in the commit-graph
>> file two numbers: 'post(v)' and the minimum of 'post(u)' for all commits
>> reachable from 'v', let's call the latter 'min_graph(v)', then the
>> following condition is true:
>>
>>   if 'v' can reach 'u', then min_graph(v) <= post(u) <= post(v)
>>
>> If for each commit 'v' we would compute and store in the commit-graph
>> file two numbers: 'post(v)' and the minimum of 'post(u)' for commits
>> that were visited during the part of depth-first search that started
>> from 'v' (which is the minimum of post-order number for subtree of a
>> spanning tree that starts at 'v').  Let's call the later 'min_tree(v)'.
>> Then the following condition is true:
>>
>>   if min_tree(v) <= post(u) <= post(v), then 'v' can reach 'u'
>>
>> The task would be to implement computing such labelling (or a more
>> involved variant of it[3][4]), storing it in commit-graph file, and
>> using it for speeding up git commands (starting from a single chosen
>> command) such as:
>>
>>  - git merge-base --is-ancestor A B
>>  - git branch --contains A
>>  - git tag --contains A
>>  - git branch --merged A
>>  - git tag --merged A
>>  - git merge-base --all A B
>>  - git log --topo-sort
>>
>> References:
>>
>> 1. <http://githubengineering.com/counting-objects/>
>> 2. <https://devblogs.microsoft.com/devops/supercharging-the-git-commit-graph-iii-generations/>
>> 3. <https://arxiv.org/abs/1404.4465>
>> 4. <https://github.com/steps/Ferrari>
>>
>> See also discussion in:
>>
>> <https://public-inbox.org/git/86tvl0zhos.fsf@gmail.com/t/>

P.S. A bit more expanded writeup now available
at https://git.github.io/SoC-2020-Ideas/

Best,
-- 
Jakub Narębski

  reply	other threads:[~2020-03-13 10:57 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-10 14:50 [RFC] Possible idea for GSoC 2020 Jakub Narebski
2020-03-11 19:03 ` Junio C Hamano
2020-03-13 10:56   ` Jakub Narebski [this message]
2020-03-15 14:26     ` Jakub Narebski
2020-03-17 12:24       ` Kaartic Sivaraam
2020-03-17 12:39         ` Kaartic Sivaraam
2020-03-17 14:22         ` Jakub Narebski
2020-03-11 20:29 ` Christian Couder
2020-03-11 21:30   ` Jakub Narebski
2020-03-11 21:48     ` Christian Couder
2020-03-12 12:17       ` Jakub Narebski
2020-03-12 13:08         ` Jakub Narebski
2020-03-13 10:59           ` Jakub Narebski
2020-03-13 13:08 ` Philip Oakley
2020-03-13 14:34   ` Jakub Narebski
2020-03-15 18:57     ` Philip Oakley
2020-03-15 21:14       ` Jakub Narebski
2020-03-16 14:47         ` Philip Oakley
2020-03-16 12:44 ` Derrick Stolee
2020-03-17  3:13   ` Jakub Narebski
2020-03-17  7:24     ` Christian Couder
2020-03-17 11:49       ` Derrick Stolee
2020-03-17 14:18       ` Jakub Narebski
2020-03-17 17:04         ` Christian Couder
2020-03-18 13:55           ` Jakub Narebski
2020-03-18 15:25             ` Derrick Stolee
2020-03-19 12:52               ` Jakub Narebski
  -- strict thread matches above, loose matches on Subject: below --
2020-03-13 17:30 Abhishek Kumar
2020-03-17 17:00 Abhishek Kumar
2020-03-17 18:05 ` Jakub Narebski
2020-03-17 18:00 Abhishek Kumar
2020-03-19 12:50 ` Jakub Narebski
     [not found] <CAHk66ftQqFqP-4kd4-8cHtCMEofSUvbeSQ24pcCCrkz7+2JG1w@mail.gmail.com>
2020-03-27 18:31 ` Jakub Narębski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86tv2s34lo.fsf@gmail.com \
    --to=jnareb@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=heba.waly@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.