git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Idea: indirect authorship info
@ 2023-09-11  4:17 Yawar Amin
  2023-09-11  5:23 ` Junio C Hamano
  0 siblings, 1 reply; 2+ messages in thread
From: Yawar Amin @ 2023-09-11  4:17 UTC (permalink / raw)
  To: git

Hi,

I have an idea about enabling management of authorship info in a git repo, to
make it easier to manage and potentially remove author/committer/tagger PII (in
the context of GDPR), without having to change any commit history/SHAs.
Apologies if this has been brought up before, but I failed to find anything
relevant from some quick searches:

- https://lore.kernel.org/git/?q=indirect+authors
- https://lore.kernel.org/git/?q=authors+file
- https://lore.kernel.org/git/?q=people+file

Potential use cases:

- Someone requests that their names and email addresses be removed from a public
  repo's history under GDPR Right to be Forgotten (although, based on [1], it's
  not clear if projects could be forced to do honour RTBF or not)
- Someone requests that their legal name change be reflected in a public repo

I am interested in hearing your thoughts on this basic idea:

- On committing/tagging/creating a note, the identity of the author is not saved
  in the commit etc. object itself but in a separate file e.g. `.git/people`:
  `d0efaf97-e18a-4197-b2c0-61c05efec75e <-> Yawar Amin <yawar.amin@gmail.com>`
- Instead of the real identity of the author, a pointer to the `people` file
  entry is stored e.g. `Author: d0efaf97-e18a-4197-b2c0-61c05efec75e`
- If an entry for the person already exists in the `people` file, it is reused
- When syncing with a remote repo, new entries in the `people` file are synced
  along with other objects (in an append-only manner, not editing existing
  entries)
- Git uses the `people` file to cross-reference and fill in authorship
  info when it renders commit etc. objects like in `git log`, `git show` etc.
- If git can't find the authorship info in the `people` file it renders some
  appropriate default e.g. `(Unknown)`

Project owners (with write access to the hosted repo) are able to edit and push
changes to the `people` file. In this way they can fulfill change requests like
the ones I mentioned above.

Regards,

Yawar

[1] https://www.dataprotection.ie/en/individuals/know-your-rights/right-erasure-articles-17-19-gdpr

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Idea: indirect authorship info
  2023-09-11  4:17 Idea: indirect authorship info Yawar Amin
@ 2023-09-11  5:23 ` Junio C Hamano
  0 siblings, 0 replies; 2+ messages in thread
From: Junio C Hamano @ 2023-09-11  5:23 UTC (permalink / raw)
  To: Yawar Amin; +Cc: git

Yawar Amin <yawar.amin@gmail.com> writes:

> Apologies if this has been brought up before, but I failed to find anything
> relevant from some quick searches:
>
> - https://lore.kernel.org/git/?q=indirect+authors
> - https://lore.kernel.org/git/?q=authors+file
> - https://lore.kernel.org/git/?q=people+file

Try "mailmap+deadname".  You essentially reinvented these earlier
proposals, except that they reused the existing ".mailmap"
mechanism, and how the key is chosen.

While I do not think of any reason why the desire to achieve the end
goal of these efforts is bad, some parts of your design (and other
proposals) need rethinking.

Projects often need to know and show who did what for legal reasons.
Imagine an old commit needs to be shown to document who made the
contribution to the project.  An in-tree ".mailmap" file can give
adequate guarantee that the anatomized commit author name the commit
carries documents who it the really is in the ".mailmap" file
recorded in the tree of that partcular commit, even if some
contributors who stopped contributing and asked to go back to
anonymity have disappeared from that file.  But they may still
appear if you did "git log -p .mailmap", which would meet the needs
of these projects, but it means that you cannot be forgotten and
have to live with the consequences of what you did in the past.  On
the other hand, if there is no in-tree ".mailmap" (or "people")
file, it brings up many questions.  It becomes unclear who will keep
track of the latest version. There needs a way to guarantee that the
entries still in the mapping file can be used to verify the claim
that some person did a particular commit.  Of course, as long as it
is distributed to project participants for communication ("hey you
worked on this feature 6 months ago; can you answer a few
questions?") and verification ("yes, this commit was done by this
person who was not affiliated with company X in any way. how do you
substantiate your claim that this project stole it from the
company?")  purposes, somebody will bound to make and keep copies,
which means that you cannot become truly anonymous, after making
yourself known.

As to the choice of the anonymous key used as a stand-in value for
the author and the committer identity, using something that is not
deterministic (like uuid) is not a good idea.  If the name/address
are hashed with some algorithm that is cryptographically secure and
is one-way, it would probably suffice both for anonymity purposes
(as you need to "reverse" such a hash to get to the real author) and
allows easy verification (if you need to "prove" that an anonymised
author 9f5d8e44edfb7e1aa4dcf34acc3b4d643f83e1b6 recorded in the
commit object is an author with a known name/address, you can feed
that name/address to the hash function and if it hashes to the same
value, that claim is as good as having the name/address directly
recorded in the commit object.



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-09-11  5:23 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-11  4:17 Idea: indirect authorship info Yawar Amin
2023-09-11  5:23 ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).