git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: <rsbecker@nexbridge.com>
To: "'Sean Allred'" <allred.sean@gmail.com>
Cc: "'Junio C Hamano'" <gitster@pobox.com>, <git@vger.kernel.org>,
	<sallred@epic.com>, <grmason@epic.com>, <sconrad@epic.com>
Subject: RE: Dealing with corporate email recycling
Date: Sun, 13 Mar 2022 11:02:24 -0400	[thread overview]
Message-ID: <01f301d836eb$5c7a6810$156f3830$@nexbridge.com> (raw)
In-Reply-To: <87v8whap0b.fsf@gmail.com>

On March 13, 2022 10:41 AM, Sean Allred wrote:
><rsbecker@nexbridge.com> writes:
>> (I am a little nervous about this advice, hoping others will chime in
>> and correct anything wrong here)
>>
>> While this will change the commit hashes, AFAIK, the other metadata is
>> preserved, including date, author, and committer. Set up the specific
>> keys/settings in ssh-agent and the user.signingKey value, then:
>>
>> git filter-branch --commit-filter 'git commit-tree -S "$@";'
>> <FROM-COMMIT>..<TO-COMMIT>
>>
>> Others might have a better way of doing this or may tell me this will
>> not work. Test this before you do it. I have not done this operation
>> before. You do need to start from the oldest commit going forward
>> otherwise I think that filter-branch will (should!) invalidate child
>> commits. I suspect this is going to be a rather lengthy script to build and run.
>
>Given the size of our history (several orders of magnitude larger than linux.git),
>using git-filter-branch after the fact is certainly not ideal.  The replay already takes
>a week to run (we're IO-bound).  We'd rather want to extend git-fast-import to
>allow signing commits instead
>-- which comes back to our shared 'nervousness' about this approach in
>general: I don't know that Git should endorse this as a standard option.
>
>But yes -- hoping others can chime in with more thoughts :-)

I have another reluctant suggestion, but it depends on your industry, regulations, and other factors. In some sectors, there is a requirement to keep only some period of time worth of history. In fact, in some settings, keeping user identifying information beyond, say 7 years, actually is problematic. Pruning your history may be not only an option but required. An alternative is to use filter-branch to essentially tokenize the identities of past authors and keep those in a electronic vault somewhere. I have customers who are interpreting GDPR-like rules just such as situation, where employees gone 7 years ago and cannot be retained, by name, in the repos. I am not personally happy about that, because my own repo-OCD demands that I know exactly who did what until the end of time, but according to them, it actually violates the local regulations. I'm sure you have had conversations with lawyers, yes? ☹


  reply	other threads:[~2022-03-13 15:02 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-12 22:38 Dealing with corporate email recycling Sean Allred
2022-03-13  0:03 ` Junio C Hamano
2022-03-13  0:26   ` rsbecker
2022-03-13 14:01     ` Sean Allred
2022-03-13 14:20       ` rsbecker
2022-03-13 14:41         ` Sean Allred
2022-03-13 15:02           ` rsbecker [this message]
2022-03-13 15:21             ` Sean Allred
2022-03-13 19:57               ` Philip Oakley
2022-03-13 22:40                 ` Sean Allred
2022-03-13 23:16                   ` Junio C Hamano
2022-03-13 23:23                     ` rsbecker
2022-03-14  0:19                       ` Junio C Hamano
2022-03-14 11:56                     ` Philip Oakley
2022-03-14 21:24                       ` Junio C Hamano
2022-03-14 22:25                         ` Philip Oakley
2022-03-15  1:23                       ` Sean Allred
2022-03-15 11:15                         ` Philip Oakley
2022-03-13 12:20 ` Philip Oakley
2022-03-13 13:35   ` Sean Allred
2022-03-14 11:59     ` Philip Oakley
2022-03-13 15:51 ` Ævar Arnfjörð Bjarmason
2022-03-13 17:22 ` brian m. carlson
2022-03-13 17:52   ` rsbecker
2022-03-13 19:47     ` rsbecker
2022-03-13 22:23       ` Sean Allred
2022-03-15  1:27 ` Sean Allred
2022-03-18 21:22 ` Peter Krefting

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='01f301d836eb$5c7a6810$156f3830$@nexbridge.com' \
    --to=rsbecker@nexbridge.com \
    --cc=allred.sean@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=grmason@epic.com \
    --cc=sallred@epic.com \
    --cc=sconrad@epic.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).