All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: git@vger.kernel.org
Cc: Christian Couder <christian.couder@gmail.com>
Subject: [RFC] Possible idea for GSoC 2020
Date: Tue, 10 Mar 2020 15:50:24 +0100	[thread overview]
Message-ID: <86mu8o8dsf.fsf@gmail.com> (raw)

Hello,

Here below is a possible proposal for a more difficult Google Summer of
Code 2020 project.

A few questions:
- is it too late to propose a new project idea for GSoC 2020?
- is it too difficult of a project for GSoC?

Best,

  Jakub Narębski

--------------------------------------------------

### Graph labelling for speeding up git commands

 - Language: C
 - Difficulty: hard / difficult
 - Possible mentors: Jakub Narębski

Git uses various clever methods for making operations on very large
repositories faster, from bitmap indices for git-fetch[1], to generation
numbers (also known as topological levels) in the commit-graph file for
commit graph traversal operations like `git log --graph`[2].

One possible improvement that can make Git even faster is using min-post
intervals labelling.  The basis of this labelling is post-visit order of
a depth-first search traversal tree of a commit graph, let's call it
'post(v)'.

If for each commit 'v' we would compute and store in the commit-graph
file two numbers: 'post(v)' and the minimum of 'post(u)' for all commits
reachable from 'v', let's call the latter 'min_graph(v)', then the
following condition is true:

  if 'v' can reach 'u', then min_graph(v) <= post(u) <= post(v)

If for each commit 'v' we would compute and store in the commit-graph
file two numbers: 'post(v)' and the minimum of 'post(u)' for commits
that were visited during the part of depth-first search that started
from 'v' (which is the minimum of post-order number for subtree of a
spanning tree that starts at 'v').  Let's call the later 'min_tree(v)'.
Then the following condition is true:

  if min_tree(v) <= post(u) <= post(v), then 'v' can reach 'u'

The task would be to implement computing such labelling (or a more
involved variant of it[3][4]), storing it in commit-graph file, and
using it for speeding up git commands (starting from a single chosen
command) such as:

 - git merge-base --is-ancestor A B
 - git branch --contains A
 - git tag --contains A
 - git branch --merged A
 - git tag --merged A
 - git merge-base --all A B
 - git log --topo-sort

References:

1. <http://githubengineering.com/counting-objects/>
2. <https://devblogs.microsoft.com/devops/supercharging-the-git-commit-graph-iii-generations/>
3. <https://arxiv.org/abs/1404.4465>
4. <https://github.com/steps/Ferrari>

See also discussion in:

<https://public-inbox.org/git/86tvl0zhos.fsf@gmail.com/t/>

             reply	other threads:[~2020-03-10 14:50 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-10 14:50 Jakub Narebski [this message]
2020-03-11 19:03 ` [RFC] Possible idea for GSoC 2020 Junio C Hamano
2020-03-13 10:56   ` Jakub Narebski
2020-03-15 14:26     ` Jakub Narebski
2020-03-17 12:24       ` Kaartic Sivaraam
2020-03-17 12:39         ` Kaartic Sivaraam
2020-03-17 14:22         ` Jakub Narebski
2020-03-11 20:29 ` Christian Couder
2020-03-11 21:30   ` Jakub Narebski
2020-03-11 21:48     ` Christian Couder
2020-03-12 12:17       ` Jakub Narebski
2020-03-12 13:08         ` Jakub Narebski
2020-03-13 10:59           ` Jakub Narebski
2020-03-13 13:08 ` Philip Oakley
2020-03-13 14:34   ` Jakub Narebski
2020-03-15 18:57     ` Philip Oakley
2020-03-15 21:14       ` Jakub Narebski
2020-03-16 14:47         ` Philip Oakley
2020-03-16 12:44 ` Derrick Stolee
2020-03-17  3:13   ` Jakub Narebski
2020-03-17  7:24     ` Christian Couder
2020-03-17 11:49       ` Derrick Stolee
2020-03-17 14:18       ` Jakub Narebski
2020-03-17 17:04         ` Christian Couder
2020-03-18 13:55           ` Jakub Narebski
2020-03-18 15:25             ` Derrick Stolee
2020-03-19 12:52               ` Jakub Narebski
  -- strict thread matches above, loose matches on Subject: below --
2020-03-13 17:30 Abhishek Kumar
2020-03-17 17:00 Abhishek Kumar
2020-03-17 18:05 ` Jakub Narebski
2020-03-17 18:00 Abhishek Kumar
2020-03-19 12:50 ` Jakub Narebski
     [not found] <CAHk66ftQqFqP-4kd4-8cHtCMEofSUvbeSQ24pcCCrkz7+2JG1w@mail.gmail.com>
2020-03-27 18:31 ` Jakub Narębski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86mu8o8dsf.fsf@gmail.com \
    --to=jnareb@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.