From: Jakub Narebski <jnareb@gmail.com>
To: Christian Couder <christian.couder@gmail.com>
Cc: Derrick Stolee <stolee@gmail.com>, git <git@vger.kernel.org>,
Heba Waly <heba.waly@gmail.com>,
Junio C Hamano <gitster@pobox.com>,
Jonathan Tan <jonathantanmy@google.com>,
Emily Shaffer <emilyshaffer@google.com>,
Abhishek Kumar <abhishekkumar8222@gmail.com>
Subject: Re: [RFC] Possible idea for GSoC 2020
Date: Wed, 18 Mar 2020 14:55:54 +0100 [thread overview]
Message-ID: <86mu8d6a39.fsf@gmail.com> (raw)
In-Reply-To: <CAP8UFD1ihR1PtM2y24HTKa2woGXMVFq9MoVSj1cHVZCNWSH7EA@mail.gmail.com> (Christian Couder's message of "Tue, 17 Mar 2020 18:04:23 +0100")
Christian Couder <christian.couder@gmail.com> writes:
> On Tue, Mar 17, 2020 at 3:18 PM Jakub Narebski <jnareb@gmail.com> wrote:
>> Christian Couder <christian.couder@gmail.com> writes:
>>> On Tue, Mar 17, 2020 at 4:13 AM Jakub Narebski <jnareb@gmail.com> wrote:
[...]
>>>> As I wrote below, such positive-cut filter would be directly helpful in
>>>> performing the following commands:
>>>>
>>>> - `git merge-base --is-ancestor`
>>>> - `git branch --contains`
>>>> - `git tag --contains`
>>>> - `git branch --merged`
>>>> - `git tag --merged`
>>>>
>>>> It would be also useful for tag autofollow in git-fetch; is is N-to-M
>>>> equivalent to 1-to-N / N-to-1 `--contains` queries.
>>>>
>>>> I am quite sure that positive-cut filter would make `--ancestry-path`
>>>> walk faster.
>>>>
>>>> I think, but I am not sure, that positive-cut filter can make parts of
>>>> topological sort and merge base algorithms at least a tiny bit faster.
>>>
>>> Is there an easy way to check that it would provide significant
>>> performance improvements at least in some cases? Can we ask the
>>> student to do that at the beginning of the GSoC?
>>
>> The "Reachability labels for version control graphs.ipynb" Jupyter
>> Notebook on Google Colaboratory was created to answer this question
>> (originally for the FELINE reachability index). Among others it can
>> min-post interval labels and topological levels (generation numbers),
>> use them for reachability queries, and load Linux kernel
>> commit-graph. The exploration didn't get finished, but it would be not
>> that difficult, I think, to at least find the amount of false negatives
>> for min-post interval labeling for git.git or Linux kernel repo.
>>
>> https://colab.research.google.com/drive/1V-U7_slu5Z3s5iEEMFKhLXtaxSu5xyzg
>>
>> As Jupyter Notebook, it is run in the web browser. It can either use
>> local runtime, or run on Google Cloud runtime. On the other hand it
>> requires at least some knowledge of Python...
>
> Ok, if that's a possible approach to the project, then please add it
> to the description.
Added as a new paragraph in the 'interval labels' subtask description
https://github.com/git/git.github.io/commit/ac526b18b97e53a431767ce147f27eaf6af0aa76
[...]
>>>>> My recommendation is that a GSoC student update the
>>>>> generation number to "v2" based on the definition you made in [1].
>>>>> That proposal is also more likely to be effective in Git because
>>>>> it makes use of extra heuristic information (commit date) to
>>>>> assist the types of algorithms we care about.
>>>>>
>>>>> In that case, the "difficult" part is moving the "generation"
>>>>> member of struct commit into a slab before making it a 64-bit
>>>>> value. (This is likely necessary for your plan, anyway.) Updating
>>>>> the generation number to v2 is relatively straight-forward after
>>>>> that, as someone can follow all places that reference or compute
>>>>> generation numbers and apply a diff
>>>>
>>>> Good idea! Though I am not sure if it is not too late to add it to the
>>>> https://git.github.io/SoC-2020-Ideas/ as the self imposed deadline of
>>>> March 16 (where students can start submitting proposals to GSoC) has
>>>> just passed. Christian, what do you think?
>>>
>>> Would that be a different project idea or part of your "Graph labeling
>>> for speeding up git commands" project idea?
>>>
>>> I am very reluctant to add new project ideas at that time. I don't
>>> think student will have time to properly research it and get it
>>> reviewed.
[...]
>> So perhaps we should expand "Commit graph labeling for speeding up git
>> commands" idea, splitting it into two possible ways of working in this
>> project: the more technical 'Generation number v2', and 'Interval labels
>> for commit graph' which is more of research project?
>
> Ok for that.
>
>> Which should be
>> put first, then?
>
> Which ever you prefer. If you say that any part could be done
> separately, that doesn't matter much.
I have added 'Generation number v2' as one of alternative ways of
working on the more generic "Commit graph labeling for speeding up git
commands" idea -- as first task, because it fit better the narrative:
https://github.com/git/git.github.io/commit/a6d59820709417081c334a5120342987ff046f1a
Could you (or Stolee) check current proposal, so that it can be merged
in? Thanks in advance.
https://github.com/git/git.github.io/blob/soc-2020-idea-jnareb/SoC-2020-Ideas.md
Best,
--
Jakub Narębski
next prev parent reply other threads:[~2020-03-18 13:56 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-10 14:50 [RFC] Possible idea for GSoC 2020 Jakub Narebski
2020-03-11 19:03 ` Junio C Hamano
2020-03-13 10:56 ` Jakub Narebski
2020-03-15 14:26 ` Jakub Narebski
2020-03-17 12:24 ` Kaartic Sivaraam
2020-03-17 12:39 ` Kaartic Sivaraam
2020-03-17 14:22 ` Jakub Narebski
2020-03-11 20:29 ` Christian Couder
2020-03-11 21:30 ` Jakub Narebski
2020-03-11 21:48 ` Christian Couder
2020-03-12 12:17 ` Jakub Narebski
2020-03-12 13:08 ` Jakub Narebski
2020-03-13 10:59 ` Jakub Narebski
2020-03-13 13:08 ` Philip Oakley
2020-03-13 14:34 ` Jakub Narebski
2020-03-15 18:57 ` Philip Oakley
2020-03-15 21:14 ` Jakub Narebski
2020-03-16 14:47 ` Philip Oakley
2020-03-16 12:44 ` Derrick Stolee
2020-03-17 3:13 ` Jakub Narebski
2020-03-17 7:24 ` Christian Couder
2020-03-17 11:49 ` Derrick Stolee
2020-03-17 14:18 ` Jakub Narebski
2020-03-17 17:04 ` Christian Couder
2020-03-18 13:55 ` Jakub Narebski [this message]
2020-03-18 15:25 ` Derrick Stolee
2020-03-19 12:52 ` Jakub Narebski
-- strict thread matches above, loose matches on Subject: below --
2020-03-13 17:30 Abhishek Kumar
2020-03-17 17:00 Abhishek Kumar
2020-03-17 18:05 ` Jakub Narebski
2020-03-17 18:00 Abhishek Kumar
2020-03-19 12:50 ` Jakub Narebski
[not found] <CAHk66ftQqFqP-4kd4-8cHtCMEofSUvbeSQ24pcCCrkz7+2JG1w@mail.gmail.com>
2020-03-27 18:31 ` Jakub Narębski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86mu8d6a39.fsf@gmail.com \
--to=jnareb@gmail.com \
--cc=abhishekkumar8222@gmail.com \
--cc=christian.couder@gmail.com \
--cc=emilyshaffer@google.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=heba.waly@gmail.com \
--cc=jonathantanmy@google.com \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.