git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Martin Fick <mfick@codeaurora.org>
Cc: Taylor Blau <ttaylorr@github.com>, Sun Chao <16657101987@163.com>,
	Taylor Blau <me@ttaylorr.com>,
	Sun Chao via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Subject: Re: [PATCH v2] packfile: freshen the mtime of packfile by configuration
Date: Tue, 20 Jul 2021 08:32:35 +0200	[thread overview]
Message-ID: <875yx5wkt2.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <3112447.ymCj9SdLpg@mfick-lnx>


On Wed, Jul 14 2021, Martin Fick wrote:

> On Wednesday, July 14, 2021 9:41:42 PM MDT you wrote:
>> On Wed, Jul 14 2021, Martin Fick wrote:
>> > On Wednesday, July 14, 2021 8:19:15 PM MDT Ævar Arnfjörð Bjarmason wrote:
>> >> The best way to get backups of git repositories you know are correct are
>> >> is to use git's own transport mechanisms, i.e. fetch/pull the data, or
>> >> create bundles from it.
>> > 
>> > I don't think this is a fair recommendation since unfortunately, this
>> > cannot be used to create a full backup. This can be used to back up the
>> > version controlled data, but not the repositories meta-data, i.e.
>> > configs, reflogs, alternate setups...
>> 
>> *nod*
>> 
>> FWIW at an ex-job I helped systems administrators who'd produced such a
>> broken backup-via-rsync create a hybrid version as an interim
>> solution. I.e. it would sync the objects via git transport, and do an
>> rsync on a whitelist (or blacklist), so pickup config, but exclude
>> objects.
>> 
>> "Hybrid" because it was in a state of needing to deal with manual
>> tweaking of config.
>> 
>> But usually someone who's needing to thoroughly solve this backup
>> problem will inevitably end up with wanting to drive everything that's
>> not in the object or refstore from some external system, i.e. have
>> config be generated from puppet, a database etc., ditto for alternates
>> etc.
>> 
>> But even if you can't get to that point (or don't want to) I'd say aim
>> for the hybrid system.
>> 
>> This isn't some purely theoretical concern b.t.w., the system using
>> rsync like this was producing repos that wouldn't fsck all the time, and
>> it wasn't such a busy site.
>> 
>> I suspect (but haven't tried) that for someone who can't easily change
>> their backup solution they'd get most of the benefits of git-native
>> transport by having their "rsync" sync refs, then objects, not the other
>> way around. Glob order dictates that most backup systems will do
>> objects, then refs (which will of course, at that point, refer to
>> nonexisting objects).
>> 
>> It's still not safe, you'll still be subject to races, but probably a
>> lot better in practice.
>
> It would be great if git provided a command to do a reliable incremental 
> backup, maybe it could copy things in the order that you mention?

I don't think we can or want to support this sort of thing ever, for the
same reason that you probably won't convince MySQL,PostgreSQL etc. that
they should support "cp -r" as a mode for backing up their live database
services.

I mean, there is the topic of git being lazy about fsync() etc, but even
if all of that were 100% solved you'd still get bad things if you picked
an arbitrary time to snapshot a running git directory, e.g. your
"master" branch might have a "master.lock" because it was in the middle
of an update.

If you used "fetch/clone/bundle" etc. to get the data no problem, but if
your snapshot happens then you'd need to manually clean that up, a
situation which in practice wouldn't persist, but would be persistent
with a snapshot approach.

> However, most people will want to use the backup system they have and not a 
> special git tool. Maybe git fsck should gain a switch that would rewind any 
> refs to an older point that is no broken (using reflogs)? That way, most 
> backups would just work and be rewound to the point at which the backup 
> started?

I think the main problem in the wild is not the inability of using a
special tool, but one of education. Most people wouldn't think of "cp
-r" as a first approach to say backing up a live mysql server, they'd
use mysqldump and the like.

But for some reason git is considered "not a database" enough that those
same people would just use rsync/tar/whatever, and are then surprised
when their data is corrupt or in some weird or inconsistent state...

Anyway, see also my just-posted:
https://lore.kernel.org/git/878s21wl4z.fsf@evledraar.gmail.com/

I.e. I'm not saying "never use rsync", there's cases where that's fine,
but for a live "real" server I'd say solutions in that class shouldn't
be considered/actively migrated away from.

  reply	other threads:[~2021-07-20  6:40 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-10 19:01 [PATCH] packfile: enhance the mtime of packfile by idx file Sun Chao via GitGitGadget
2021-07-11 23:44 ` Ævar Arnfjörð Bjarmason
2021-07-12 16:17   ` Sun Chao
2021-07-14  1:28 ` [PATCH v2] packfile: freshen the mtime of packfile by configuration Sun Chao via GitGitGadget
2021-07-14  1:39   ` Ævar Arnfjörð Bjarmason
2021-07-14  2:52     ` Taylor Blau
2021-07-14 16:46       ` Sun Chao
2021-07-14 17:04         ` Taylor Blau
2021-07-14 18:19           ` Ævar Arnfjörð Bjarmason
2021-07-14 19:11             ` Martin Fick
2021-07-14 19:41               ` Ævar Arnfjörð Bjarmason
2021-07-14 20:20                 ` Martin Fick
2021-07-20  6:32                   ` Ævar Arnfjörð Bjarmason [this message]
2021-07-15  8:23                 ` Son Luong Ngoc
2021-07-20  6:29                   ` Ævar Arnfjörð Bjarmason
2021-07-14 19:30             ` Taylor Blau
2021-07-14 19:32               ` Ævar Arnfjörð Bjarmason
2021-07-14 19:52                 ` Taylor Blau
2021-07-14 21:40               ` Junio C Hamano
2021-07-15 16:30           ` Sun Chao
2021-07-15 16:42             ` Taylor Blau
2021-07-15 16:48               ` Sun Chao
2021-07-14 16:11     ` Sun Chao
2021-07-19 19:53   ` [PATCH v3] " Sun Chao via GitGitGadget
2021-07-19 20:51     ` Taylor Blau
2021-07-20  0:07       ` Junio C Hamano
2021-07-20 15:07         ` Sun Chao
2021-07-20  6:19       ` Ævar Arnfjörð Bjarmason
2021-07-20 15:34         ` Sun Chao
2021-07-20 15:00       ` Sun Chao
2021-07-20 16:53         ` Taylor Blau
2021-08-15 17:08     ` [PATCH v4 0/2] " Sun Chao via GitGitGadget
2021-08-15 17:08       ` [PATCH v4 1/2] packfile: rename `derive_filename()` to `derive_pack_filename()` Sun Chao via GitGitGadget
2021-08-15 17:08       ` [PATCH v4 2/2] packfile: freshen the mtime of packfile by bump file Sun Chao via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875yx5wkt2.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=16657101987@163.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=mfick@codeaurora.org \
    --cc=ttaylorr@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).