From: Thomas Gummerer <t.gummerer@gmail.com>
To: Lars Schneider <larsxschneider@gmail.com>
Cc: Matthieu Moy <Matthieu.Moy@grenoble-inp.fr>,
git <git@vger.kernel.org>, Jeff King <peff@peff.net>,
Christian Couder <christian.couder@gmail.com>,
Johannes Schindelin <Johannes.Schindelin@gmx.de>,
Stefan Beller <sbeller@google.com>
Subject: Re: GSoC 2016: applications open, deadline = Fri, 19/2
Date: Fri, 19 Feb 2016 12:46:57 +0100 [thread overview]
Message-ID: <20160219114657.GG1831@hank> (raw)
In-Reply-To: <1CE3F5E2-DDCC-4F1B-93CF-1A4A194650BF@gmail.com>
On 02/18, Lars Schneider wrote:
>
> On 17 Feb 2016, at 19:58, Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> wrote:
>
> > Lars Schneider <larsxschneider@gmail.com> writes:
> >
> >> Coincidentally I started working on similar thing already (1) and I have
> >> lots of ideas around it.
> >
> > I guess it's time to start sharing these ideas then ;-).
> >
> > I think there's a lot to do. If we want to push this idea as a GSoC
> > project, we need:
> >
> > * A rough plan. We can't expect students to read a vague text like
> > "let's make Git safer" and write a real proposal out of it.
> >
> > * A way to start this rough plan incrementally (i.e. first step should
> > be easy and mergeable without waiting for next steps).
> >
> > Feel free to start writting an idea for
> > http://git.github.io/SoC-2016-Ideas/. It'd be nice to have a few more
> > ideas before Friday. We can polish them later if needed.
>
> I published my ideas here:
> https://github.com/git/git.github.io/pull/125/files
Sorry for posting my idea so late, but it took me a while to write
this all up, and life has a habit of getting in the way. My idea goes
into a different direction than yours.
I do like the remote whitelist/blacklist project.
Junio pointed out to me off list that this is to complicated for a
GSoC project. I kind of agree with that, but I wanted to see how this
could be split up, to completely convince myself as well. And indeed,
the more I think about it the more risky it seems.
Below there are some thoughts on a potential design, in case someone
is interested, no code to back any of this up, sorry.
Everything proposed below should be hidden behind some configuration
variable, potentially one per command (?)
- start with git-clean. It's well defined which files are cleaned
from a repository when running the command. Add them to a commit on
the tip of the current branch.
Start a new branch (or use the existing one if applicable) in
refs/restore/history, and add a commit including a notes file. The
commit message contains the operation that was executed (clean in
this case), and the hash of the commit we created which includes the
cleaned files.
Add a note to the commit, detailing from which command we come from,
which files we added (not strictly necessary, as we can infer it
from the parent commit).
Useful in itself as the user can recover the files manually if
needed, and can be sent as separate patch series.
Potential problems: Git has no way to track directories. This can
be mitigated by keeping the list of directories in the attached
note.
- add a git recover command. The command looks at This would look like `git recover
<commit>`, where commit is the hash of the commit we saved before.
This works by reading the note attached to the commit, figuring out
which command was run before, and restoring the state we were in
before.
Potential problems: conflicts, but I think this can be solved by
simply erroring out, at least in the first iteration.
- the next command could be git mv -f, git reset -f and friends. It
gets more tricky here, as we'll have to deal with the state of the
files in the index.
Analogous to git clean, the changes in the working tree are all
staged and added to a new commit on the tip of the current branch.
The note on this commit needs to contain the necessary data to
rebuild the state in the index. The format is more closely
specified below. We also need the corresponding changes in the
git restore command.
Restored files will be written to disk as racily smudged, so the
contents are checked by git, as we lost the meta-data anyway. This
comes at a slight performance impact, but I think that's okay as we
potentially saved the user a lot of time re-doing all the changes.
- git branch/tag --force. Store the name and the old location of the
branch in refs/restore/history. There are no files lost with this
operation, so no additional commits as for git clean or git reset
etc. are needed. The format of the commit depends on the exact
operation that was forced, for exact format see below.
This treatment can't make all operations safe. Any operation that
touches the remote is hard to undo as some users already might have
fetched the new state of the remote (e.g. git push -f). Others such
as git-gc will inevitably delete information from the disk, but
changing that
There's more, but I don't think just writing up all commands without
any code would make any sense.
Formats:
- commits in refs/restore/history:
empty commits with the following commit message format for git-clean
and git-reset and friends:
$versionnumber\n
$command\n
$branchname\n
$sha1ofreferencedcommit\n
empty commits with the following commit message format for git branch
and friends
$versionnumber\n
$command\n (this includes the exact operation that was forced
(e.g. move, delete etc.)
$branchname\n
$sha1ThatWasReferencedByTheBranch\n
$overwrittenbranchname\n (this and the sha1 below are only used for
--move)
$sha1ReferencedByOverwrittenBranch\n
- notes file: The format can be different for different commands, as
they all have different needs
- git clean:
list of affected files and directories separated by '\0'.
I think we could get away with only the directories, but adding
the filenames as well might make the recovery part simpler.
- git reset, etc.:
the following info is stored for each file that is modified by the
original command.
32-bit signature
32-bit number of index entries
32-bit mode (object type + unix permissions)
160-bit SHA-1
16-bit flags (extra careful here what we want to do with the
assume valid flag)
path name (variable length)
resolve-undo extension (same format as in the index)
Alternatives:
- Have a history for each branch in refs/restore/$branchname.
* Advantages:
Each branch has its own history, which can lead to fewer conflicts
when restoring (e.g. user uses `git reset --hard` on one branch,
switches to another branch works (potentially adds more stuff to
this branch), later goes back to the old branch and discovers `git
reset --hard` was actually the wrong thing to do and would like
the data back.
* Disadvantages:
It is harder for the user to intuitively know what git restore
will do exactly.
It's much more limited when we want to extend it to branch
removals, etc.
- Storing additional information in the refs/restore/history ref
* Advantages:
No need for extra notes
* Disadvantages:
Data doesn't get garbage collected without user interaction,
potentially blowing up the repository size. Especially using `git
clean`, where binary files might be involved.
- Store the whole index in the note
* Advantages:
Simpler way of restoring the index (including all of the
extensions)
* Disadvantages:
Need to take care of both the index and the split index.
Will consume a lot more disk space in the normal case (only a few
of the files in the repository are changed, while the majority
remains unchanged).
- Store the changed files in refs/restore/history instead of a new
commit on the tip of the current branch.
* Advantages:
All the information is in one place.
Data will not be garbage collected.
* Disadvantages:
Data will not be garbage collected. (Repository size is probably
going to blow up after a while)
It takes more effort to find the parent and diff against it.
next prev parent reply other threads:[~2016-02-19 11:46 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-10 9:31 GSoC 2016: applications open, deadline = Fri, 19/2 Matthieu Moy
2016-02-10 11:09 ` Johannes Schindelin
2016-02-10 17:44 ` Stefan Beller
2016-02-11 8:36 ` Christian Couder
2016-02-12 7:10 ` Matthieu Moy
2016-02-12 8:29 ` Lars Schneider
2016-02-12 9:11 ` Matthieu Moy
2016-02-12 13:04 ` Jeff King
2016-02-12 13:11 ` Jeff King
2016-02-13 11:21 ` Matthieu Moy
2016-02-16 18:10 ` Stefan Beller
2016-02-17 10:34 ` Matthieu Moy
2016-02-17 10:45 ` Duy Nguyen
2016-02-17 13:36 ` [PATCH 0/3] Turn git-rebase--*.sh to external helpers Nguyễn Thái Ngọc Duy
2016-02-17 13:36 ` [PATCH 1/3] rebase: move common functions to rebase--lib.sh Nguyễn Thái Ngọc Duy
2016-02-17 13:36 ` [PATCH 2/3] rebase: move cleanup code to exit_rebase() Nguyễn Thái Ngọc Duy
2016-02-17 14:03 ` Matthieu Moy
2016-02-17 13:36 ` [PATCH 3/3] rebase: turn git-rebase--*.sh into separate programs Nguyễn Thái Ngọc Duy
2016-02-17 14:05 ` Matthieu Moy
2016-02-17 14:22 ` [PATCH 0/3] Turn git-rebase--*.sh to external helpers Johannes Schindelin
2016-02-17 14:40 ` Duy Nguyen
2016-02-17 13:09 ` GSoC 2016: applications open, deadline = Fri, 19/2 Johannes Schindelin
2016-02-17 16:04 ` Christian Couder
2016-02-22 9:28 ` Duy Nguyen
2016-02-22 10:22 ` Matthieu Moy
2016-02-22 21:42 ` Jeff King
2016-02-22 21:56 ` Junio C Hamano
2016-02-22 22:02 ` Jeff King
2016-02-23 13:13 ` Matthieu Moy
2016-02-24 10:52 ` Jeff King
2016-02-17 17:24 ` Thomas Gummerer
2016-02-17 18:32 ` Lars Schneider
2016-02-17 18:58 ` Matthieu Moy
2016-02-17 19:03 ` Junio C Hamano
2016-02-17 20:21 ` Matthieu Moy
2016-02-17 20:45 ` Jeff King
2016-02-17 21:33 ` Junio C Hamano
2016-02-18 9:38 ` Carlos Martín Nieto
2016-02-19 8:06 ` GSoC 2016: applications open, libgit2 and git.git Matthieu Moy
2016-02-19 9:46 ` Carlos Martín Nieto
2016-02-29 21:01 ` Git has been accepted as a GSoC 2016 mentor organization! Matthieu Moy
2016-03-08 22:46 ` Jeff King
2016-03-08 23:01 ` Junio C Hamano
2016-03-08 23:03 ` Jeff King
2016-03-09 9:55 ` Matthieu Moy
2016-03-09 14:08 ` Jeff King
2016-03-09 13:50 ` Johannes Schindelin
2016-03-09 19:34 ` Jeff King
2016-02-19 8:09 ` GSoC 2016: applications open, deadline = now => submission Matthieu Moy
2016-02-19 8:18 ` Jeff King
2016-02-19 9:10 ` GSoC 2016: applications open, deadline = now => submitted Matthieu Moy
2016-02-19 11:37 ` Jeff King
2016-02-18 8:41 ` GSoC 2016: applications open, deadline = Fri, 19/2 Lars Schneider
2016-02-18 18:38 ` Stefan Beller
2016-02-18 19:13 ` Junio C Hamano
2016-02-19 7:34 ` Matthieu Moy
2016-02-19 20:35 ` Junio C Hamano
2016-02-20 9:28 ` Johannes Schindelin
2016-02-19 9:23 ` Lars Schneider
2016-02-19 12:49 ` Matthieu Moy
2016-02-19 20:37 ` Junio C Hamano
2016-02-19 11:46 ` Thomas Gummerer [this message]
2016-02-19 3:09 ` Duy Nguyen
2016-02-19 3:20 ` Junio C Hamano
2016-02-19 3:29 ` Duy Nguyen
2016-02-19 7:17 ` Matthieu Moy
2016-02-19 9:41 ` Duy Nguyen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160219114657.GG1831@hank \
--to=t.gummerer@gmail.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=Matthieu.Moy@grenoble-inp.fr \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=larsxschneider@gmail.com \
--cc=peff@peff.net \
--cc=sbeller@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).