git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Wong <e@80x24.org>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, Neeraj Singh <nksingh85@gmail.com>
Subject: Re: [PATCH] allow disabling fsync everywhere
Date: Sat, 30 Oct 2021 10:39:50 +0000	[thread overview]
Message-ID: <20211030103950.M489266@dcvr> (raw)
In-Reply-To: <211029.86v91g3vv3.gmgdl@evledraar.gmail.com>

Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> On Thu, Oct 28 2021, Eric Wong wrote:
> 
> > "core.fsync" and the "GIT_FSYNC" environment variable now exist
> > for disabling fsync() even on packfiles and other critical data.
> 
> First off, I think this would be useful to add, even as a non-test
> thing.

<snip>

Agreed, I've LD_PRELOAD-ed libeatmydata via ~/.bashrc at times :>

OTOH, I understand Junio and brian's positions on dealing with
misinformed users losing data.  I'm also lazy when it comes to
docs and support, so I'm leaning towards keeping this exclusively
for those who read source code.

I also know it's easy to add a sysctl knob for disabling fsync
et. al. in the kernel; but it's not something I want to deal
with supporting and documenting, either.

> > Running "make test -j8 NO_SVN_TESTS=1" on a noisy 8-core system
> > on an HDD, adding "GIT_FSYNC=0" to the invocation brings my test
> > runtime from ~4 minutes down to ~3 minutes.
> 
> On a related topic: Have you tried having it use an empty template
> directory? I have some unsubmitted patches for that, and if you retain
> all trash dirs you'll find that we have IIRC 100-200MB of just
> accumulated *.sample etc. hooks, out of ~1GB total for the tests.

Not, not yet.  I also expect using reflinks on some Linux FSes
will help users save space in real-world use
(ioctl(dst_fd, FICLONE, src_fd)).

> Per the recent threads about fsync semantics this should really be
> worded to be more scary, i.e. it's not guaranteed that your data is
> on-disk at all. So subsequent git operations might see repo corruption,
> if we're being POSIX-pedantic.

Yeah, I wasn't sure how strongly to word it.  Now it's
undocumented; I suppose we can use your wording if we decide
it's worth documenting.

> > --- a/git-cvsserver.perl
> > +++ b/git-cvsserver.perl
> > @@ -3607,6 +3607,25 @@ package GITCVS::updater;
> >  use strict;
> >  use warnings;
> >  use DBI;
> > +our $_use_fsync;
> 
> s/our/my/?

Erm, yes.  Though it doesn't matter for a standalone script;
probably not worth a reroll..

> I wonder most of all if we really need these perl changes, does it
> really matter for the overall performance that you want, or was it just
> to get all fsync() uses out of the test suite?

Yes to both, or at least an attempt to...  A single fsync() can
end up being equivalent to syncfs().

Fully-disabling fsync in SVN doesn't seem possible, though.
There's a few svnadmin switches (--no-flush-to-disk +
--bdb-txn-nosync) but AFAIK they aren't 100% solutions.
Fortuntanately, NO_SVN_TESTS=1 exists.

> This is a more general comment, but so much of scaffolding like that in
> the *.perl scripts could go away if we taught git.c some "not quite a
> builtin, but it's ours" mode, and had say code for git-send-email,
> git-svn etc. to just set the various data they need in the environment
> before we invoke them. Then this would just be say:
> 
>     our $use_fsync = $ENV{GIT_FOR_CVSSERVER_FSYNC};

That's potentially a lot of environment variables, though.
Maybe just:  eval($ENV{GIT_CFG_CVSSERVER}) if $ENV{GIT_CFG_PID} == $$;
The PID check should be sufficient to avoid mismatches and
malicious code injection.

> > [...]
> >  	my $sync;
> >  	# both of these options make our .rev_db file very, very important
> >  	# and we can't afford to lose it because rebuild() won't work
> > -	if ($self->use_svm_props || $self->no_metadata) {
> > +	if (($self->use_svm_props || $self->no_metadata) && use_fsync()) {
> >  		require File::Copy;
> >  		$sync = 1;
> >  		File::Copy::copy($db, $db_lock) or die "rev_map_set(@_): ",
> 
> This doesn't just impact $io->sync, but also $io->flush, which is a
> different thing. PerlIO flushing to the OS != fsync().

Right, close($fh) (right below[1]) also takes care of PerlIO
flushing.  ->flush was only there for fsync.

[1] https://yhbt.net/lore/git/7b333ea62e/s/?b=perl/Git/SVN.pm#n2334

> This patch direction looks good to me, aside from the above we should
> really update any other fsync() docs we have, maybe with a
> cross-reference in core.fsyncObjectFiles?

Alright, though I'm happy with it being undocumented, for now.

> I'm not sure offhand what the state of the other fsync patches that
> Neeraj Singh has been submitting is, but let's make sure that whatever
> docs/config knobs etc. we have all play nicely together, and are
> somewhat future-proof if possible.

It looks like it some changes will be necessary to keep tests fast
after that lands.

      reply	other threads:[~2021-10-30 10:49 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-28  0:21 [PATCH] allow disabling fsync everywhere Eric Wong
2021-10-28  1:21 ` Eric Sunshine
2021-10-28 14:36 ` Jeff King
2021-10-28 18:28   ` Eric Wong
2021-10-28 19:29     ` Junio C Hamano
2021-10-29  0:15       ` [PATCH] tests: disable " Eric Wong
2021-10-29  5:18         ` Junio C Hamano
2021-10-29  7:56           ` Eric Wong
2021-10-29 18:12             ` Junio C Hamano
2021-10-29  7:33         ` Junio C Hamano
2021-10-29  7:48           ` Eric Wong
2021-10-29 17:22             ` Junio C Hamano
2021-10-29 20:34         ` Jeff King
2021-10-29 20:42           ` Junio C Hamano
2021-10-28 21:40     ` [PATCH] allow disabling " brian m. carlson
2021-10-29 11:20 ` Ævar Arnfjörð Bjarmason
2021-10-30 10:39   ` Eric Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211030103950.M489266@dcvr \
    --to=e@80x24.org \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=nksingh85@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).