All of lore.kernel.org
 help / color / mirror / Atom feed
From: <rsbecker@nexbridge.com>
To: "'Sean Allred'" <allred.sean@gmail.com>
Cc: "'Junio C Hamano'" <gitster@pobox.com>, <git@vger.kernel.org>,
	<sallred@epic.com>, <grmason@epic.com>, <sconrad@epic.com>
Subject: RE: Dealing with corporate email recycling
Date: Sun, 13 Mar 2022 11:02:24 -0400	[thread overview]
Message-ID: <01f301d836eb$5c7a6810$156f3830$@nexbridge.com> (raw)
In-Reply-To: <87v8whap0b.fsf@gmail.com>

On March 13, 2022 10:41 AM, Sean Allred wrote:
><rsbecker@nexbridge.com> writes:
>> (I am a little nervous about this advice, hoping others will chime in
>> and correct anything wrong here)
>>
>> While this will change the commit hashes, AFAIK, the other metadata is
>> preserved, including date, author, and committer. Set up the specific
>> keys/settings in ssh-agent and the user.signingKey value, then:
>>
>> git filter-branch --commit-filter 'git commit-tree -S "$@";'
>> <FROM-COMMIT>..<TO-COMMIT>
>>
>> Others might have a better way of doing this or may tell me this will
>> not work. Test this before you do it. I have not done this operation
>> before. You do need to start from the oldest commit going forward
>> otherwise I think that filter-branch will (should!) invalidate child
>> commits. I suspect this is going to be a rather lengthy script to build and run.
>
>Given the size of our history (several orders of magnitude larger than linux.git),
>using git-filter-branch after the fact is certainly not ideal.  The replay already takes
>a week to run (we're IO-bound).  We'd rather want to extend git-fast-import to
>allow signing commits instead
>-- which comes back to our shared 'nervousness' about this approach in
>general: I don't know that Git should endorse this as a standard option.
>
>But yes -- hoping others can chime in with more thoughts :-)

I have another reluctant suggestion, but it depends on your industry, regulations, and other factors. In some sectors, there is a requirement to keep only some period of time worth of history. In fact, in some settings, keeping user identifying information beyond, say 7 years, actually is problematic. Pruning your history may be not only an option but required. An alternative is to use filter-branch to essentially tokenize the identities of past authors and keep those in a electronic vault somewhere. I have customers who are interpreting GDPR-like rules just such as situation, where employees gone 7 years ago and cannot be retained, by name, in the repos. I am not personally happy about that, because my own repo-OCD demands that I know exactly who did what until the end of time, but according to them, it actually violates the local regulations. I'm sure you have had conversations with lawyers, yes? ☹


  reply	other threads:[~2022-03-13 15:02 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-12 22:38 Dealing with corporate email recycling Sean Allred
2022-03-13  0:03 ` Junio C Hamano
2022-03-13  0:26   ` rsbecker
2022-03-13 14:01     ` Sean Allred
2022-03-13 14:20       ` rsbecker
2022-03-13 14:41         ` Sean Allred
2022-03-13 15:02           ` rsbecker [this message]
2022-03-13 15:21             ` Sean Allred
2022-03-13 19:57               ` Philip Oakley
2022-03-13 22:40                 ` Sean Allred
2022-03-13 23:16                   ` Junio C Hamano
2022-03-13 23:23                     ` rsbecker
2022-03-14  0:19                       ` Junio C Hamano
2022-03-14 11:56                     ` Philip Oakley
2022-03-14 21:24                       ` Junio C Hamano
2022-03-14 22:25                         ` Philip Oakley
2022-03-15  1:23                       ` Sean Allred
2022-03-15 11:15                         ` Philip Oakley
2022-03-13 12:20 ` Philip Oakley
2022-03-13 13:35   ` Sean Allred
2022-03-14 11:59     ` Philip Oakley
2022-03-13 15:51 ` Ævar Arnfjörð Bjarmason
2022-03-13 17:22 ` brian m. carlson
2022-03-13 17:52   ` rsbecker
2022-03-13 19:47     ` rsbecker
2022-03-13 22:23       ` Sean Allred
2022-03-15  1:27 ` Sean Allred
2022-03-18 21:22 ` Peter Krefting

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='01f301d836eb$5c7a6810$156f3830$@nexbridge.com' \
    --to=rsbecker@nexbridge.com \
    --cc=allred.sean@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=grmason@epic.com \
    --cc=sallred@epic.com \
    --cc=sconrad@epic.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.