From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Martin Fick <mfick@codeaurora.org>
Cc: Taylor Blau <ttaylorr@github.com>, Sun Chao <16657101987@163.com>,
Taylor Blau <me@ttaylorr.com>,
Sun Chao via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org
Subject: Re: [PATCH v2] packfile: freshen the mtime of packfile by configuration
Date: Tue, 20 Jul 2021 08:32:35 +0200 [thread overview]
Message-ID: <875yx5wkt2.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <3112447.ymCj9SdLpg@mfick-lnx>
On Wed, Jul 14 2021, Martin Fick wrote:
> On Wednesday, July 14, 2021 9:41:42 PM MDT you wrote:
>> On Wed, Jul 14 2021, Martin Fick wrote:
>> > On Wednesday, July 14, 2021 8:19:15 PM MDT Ævar Arnfjörð Bjarmason wrote:
>> >> The best way to get backups of git repositories you know are correct are
>> >> is to use git's own transport mechanisms, i.e. fetch/pull the data, or
>> >> create bundles from it.
>> >
>> > I don't think this is a fair recommendation since unfortunately, this
>> > cannot be used to create a full backup. This can be used to back up the
>> > version controlled data, but not the repositories meta-data, i.e.
>> > configs, reflogs, alternate setups...
>>
>> *nod*
>>
>> FWIW at an ex-job I helped systems administrators who'd produced such a
>> broken backup-via-rsync create a hybrid version as an interim
>> solution. I.e. it would sync the objects via git transport, and do an
>> rsync on a whitelist (or blacklist), so pickup config, but exclude
>> objects.
>>
>> "Hybrid" because it was in a state of needing to deal with manual
>> tweaking of config.
>>
>> But usually someone who's needing to thoroughly solve this backup
>> problem will inevitably end up with wanting to drive everything that's
>> not in the object or refstore from some external system, i.e. have
>> config be generated from puppet, a database etc., ditto for alternates
>> etc.
>>
>> But even if you can't get to that point (or don't want to) I'd say aim
>> for the hybrid system.
>>
>> This isn't some purely theoretical concern b.t.w., the system using
>> rsync like this was producing repos that wouldn't fsck all the time, and
>> it wasn't such a busy site.
>>
>> I suspect (but haven't tried) that for someone who can't easily change
>> their backup solution they'd get most of the benefits of git-native
>> transport by having their "rsync" sync refs, then objects, not the other
>> way around. Glob order dictates that most backup systems will do
>> objects, then refs (which will of course, at that point, refer to
>> nonexisting objects).
>>
>> It's still not safe, you'll still be subject to races, but probably a
>> lot better in practice.
>
> It would be great if git provided a command to do a reliable incremental
> backup, maybe it could copy things in the order that you mention?
I don't think we can or want to support this sort of thing ever, for the
same reason that you probably won't convince MySQL,PostgreSQL etc. that
they should support "cp -r" as a mode for backing up their live database
services.
I mean, there is the topic of git being lazy about fsync() etc, but even
if all of that were 100% solved you'd still get bad things if you picked
an arbitrary time to snapshot a running git directory, e.g. your
"master" branch might have a "master.lock" because it was in the middle
of an update.
If you used "fetch/clone/bundle" etc. to get the data no problem, but if
your snapshot happens then you'd need to manually clean that up, a
situation which in practice wouldn't persist, but would be persistent
with a snapshot approach.
> However, most people will want to use the backup system they have and not a
> special git tool. Maybe git fsck should gain a switch that would rewind any
> refs to an older point that is no broken (using reflogs)? That way, most
> backups would just work and be rewound to the point at which the backup
> started?
I think the main problem in the wild is not the inability of using a
special tool, but one of education. Most people wouldn't think of "cp
-r" as a first approach to say backing up a live mysql server, they'd
use mysqldump and the like.
But for some reason git is considered "not a database" enough that those
same people would just use rsync/tar/whatever, and are then surprised
when their data is corrupt or in some weird or inconsistent state...
Anyway, see also my just-posted:
https://lore.kernel.org/git/878s21wl4z.fsf@evledraar.gmail.com/
I.e. I'm not saying "never use rsync", there's cases where that's fine,
but for a live "real" server I'd say solutions in that class shouldn't
be considered/actively migrated away from.
next prev parent reply other threads:[~2021-07-20 6:40 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-10 19:01 [PATCH] packfile: enhance the mtime of packfile by idx file Sun Chao via GitGitGadget
2021-07-11 23:44 ` Ævar Arnfjörð Bjarmason
2021-07-12 16:17 ` Sun Chao
2021-07-14 1:28 ` [PATCH v2] packfile: freshen the mtime of packfile by configuration Sun Chao via GitGitGadget
2021-07-14 1:39 ` Ævar Arnfjörð Bjarmason
2021-07-14 2:52 ` Taylor Blau
2021-07-14 16:46 ` Sun Chao
2021-07-14 17:04 ` Taylor Blau
2021-07-14 18:19 ` Ævar Arnfjörð Bjarmason
2021-07-14 19:11 ` Martin Fick
2021-07-14 19:41 ` Ævar Arnfjörð Bjarmason
2021-07-14 20:20 ` Martin Fick
2021-07-20 6:32 ` Ævar Arnfjörð Bjarmason [this message]
2021-07-15 8:23 ` Son Luong Ngoc
2021-07-20 6:29 ` Ævar Arnfjörð Bjarmason
2021-07-14 19:30 ` Taylor Blau
2021-07-14 19:32 ` Ævar Arnfjörð Bjarmason
2021-07-14 19:52 ` Taylor Blau
2021-07-14 21:40 ` Junio C Hamano
2021-07-15 16:30 ` Sun Chao
2021-07-15 16:42 ` Taylor Blau
2021-07-15 16:48 ` Sun Chao
2021-07-14 16:11 ` Sun Chao
2021-07-19 19:53 ` [PATCH v3] " Sun Chao via GitGitGadget
2021-07-19 20:51 ` Taylor Blau
2021-07-20 0:07 ` Junio C Hamano
2021-07-20 15:07 ` Sun Chao
2021-07-20 6:19 ` Ævar Arnfjörð Bjarmason
2021-07-20 15:34 ` Sun Chao
2021-07-20 15:00 ` Sun Chao
2021-07-20 16:53 ` Taylor Blau
2021-08-15 17:08 ` [PATCH v4 0/2] " Sun Chao via GitGitGadget
2021-08-15 17:08 ` [PATCH v4 1/2] packfile: rename `derive_filename()` to `derive_pack_filename()` Sun Chao via GitGitGadget
2021-08-15 17:08 ` [PATCH v4 2/2] packfile: freshen the mtime of packfile by bump file Sun Chao via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=875yx5wkt2.fsf@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=16657101987@163.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=me@ttaylorr.com \
--cc=mfick@codeaurora.org \
--cc=ttaylorr@github.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).