git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: avarab@gmail.com (Ævar Arnfjörð Bjarmason)
To: Hallvard Breien Furuseth <h.b.furuseth@usit.uio.no>
Cc: git@vger.kernel.org
Subject: Re: File versioning based on shallow Git repositories?
Date: Thu, 12 Apr 2018 20:47:21 +0200	[thread overview]
Message-ID: <87d0z4b6ti.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <hbf.20180412fvfi@bombur.uio.no>


On Thu, Apr 12 2018, Hallvard Breien Furuseth wrote:

> Can I use a shallow Git repo for file versioning, and regularly purge
> history older than e.g. 2 weeks?  Purged data MUST NOT be recoverable.
>
> Or is there a backup tool based on shallow Git cloning which does this?
> Push/pull to another shallow repo would be nice but is not required.
> The files are text files up to 1/4 Gb, usually with few changes.
>
>
> If using Git - I see "git fetch --depth" can shorten history now.
> How do I do that without 'fetch', in the origin repo?
> Also Documentation/technical/shallow.txt describes some caveats, I'm
> not sure how relevant they are.
>
> To purge old data -
>   git config core.logallrefupdates false
>   git gc --prune=now --aggressive
> Anything else?
>
> I'm guessing that without --aggressive, some expired info might be
> deduced from studying the packing of the remaining objects.  Don't
> know if we'll be required to be that paranoid.

The shallow feature is not for this use-case, but there's a much easier
solution that I've used for exactly this use-case, e.g. taking backups
of SQL dumps that delta-compress well, and then throwing out old
backups.

You:

1. Create a backup.git repo
2. Each time you make a backup, checkout a new orphan branch, see "git
   checkout --orphan"
3. You copy the files over, commit them, "git log" at this point shows
   one commit no matter if you've done this before.
4. You create a tag for this backup, e.g. one named after the current
   time, delete the branch.
5. You then have a retention period for the tags, e.g. only keep the
   last 30 tags if you do daily backups for 30 days of backups.

Then as soon as you delete the tags the old commit will be unreferenced,
and you can make git-gc delete the data.

You'll still be able to `git diff` between tags, even though they have
unrelated histories, and the files will still delta-compress.

  reply	other threads:[~2018-04-12 18:47 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-12 18:01 File versioning based on shallow Git repositories? Hallvard Breien Furuseth
2018-04-12 18:47 ` Ævar Arnfjörð Bjarmason [this message]
2018-04-12 19:36   ` Hallvard Breien Furuseth
2018-04-12 20:46     ` Ævar Arnfjörð Bjarmason
2018-04-12 21:07       ` Rafael Ascensao
2018-04-12 21:22         ` Hallvard Breien Furuseth
2018-04-13  8:52     ` Jakub Narebski
2018-04-13 11:12       ` Johannes Schindelin
2018-04-13 21:57         ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87d0z4b6ti.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=h.b.furuseth@usit.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).