From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: kreijack@inwind.it
Cc: linux-btrfs@vger.kernel.org, Chris Murphy <lists@colorremedies.com>
Subject: Re: [RFC] btrfs: strategy to perform a rollback at boot time
Date: Fri, 24 Jul 2020 22:37:52 -0400 [thread overview]
Message-ID: <20200725023752.GF5890@hungrycats.org> (raw)
In-Reply-To: <a4074100-b006-7d64-e22d-779ad15191c0@libero.it>
On Fri, Jul 24, 2020 at 01:56:58PM +0200, Goffredo Baroncelli wrote:
> On 7/23/20 11:53 PM, Zygo Blaxell wrote:
> > On Tue, Jul 21, 2020 at 10:33:39PM +0200, Goffredo Baroncelli wrote:
> > >
> > > Hi all,
> > >
> > > this is an RFC to discuss a my idea to allow a simple rollback of the
> > > root filesystem at boot time.
> > >
> > > The problem that I want to solve is the following: DPKG is very slow on
> > > a BTRFS filesystem. The reason is that DPKG massively uses
> > > sync()/fsync() to guarantee that the filesystem is always coherent even
> > > in case of sudden shutdown.
> > >
> > > The same can be useful even to the RPM Linux based distribution (which however
> > > suffer less than DPKG).
> > >
> > > A way to avoid the sync()/fsync() calls without loosing the DPKG
> > > guarantees, is:
> > > 1) perform a snapshot of the root filesystem (the rollback one)
> > > 2) upgrade the filesystem without using sync/fsync
> > > 3) final (global) sync
> > > 4) destroy the rollback snapshot
> >
> > The idea sounds OK, but there are alternatives:
> >
> > 1) perform snapshot of root filesystem
> > 2) chroot snapshot eatmydata apt dist-upgrade (*)
> > 3) sync -f snapshot
> > 4) renameat2(..., snapshot, ..., root, RENAME_EXCHANGE)
> > 5) delete snapshot
> >
> > (*) OK you have to set up /dev, /proc, /sys, etc, probably a whole
> > namespace.
> >
> > This may not play well with maintainer scripts on some distros, but it
> > does mean you don't have a half-broken system _during_ the upgrade.
>
> Also Chris, suggested that. However I don't think that it is a viable solution:
> 1) as you pointed out, most of the maintainer pre/post install scripts assume that the system is "live". So I don't think that it would be possible without auditing and updating all the packages.
> 2) what happens in case of unclean shutdown during step 4 ? To me it seems that we are performing two installations :-) The first one is at step 2 and the second one is at step 3. Moreover a move between two subvolumes is not allowed (it like a copy)
> higo@venice:/tmp$ btrfs sub crea sub1
> Create subvolume './sub1'
> ghigo@venice:/tmp$ btrfs sub crea sub2
> Create subvolume './sub2'
> ghigo@venice:/tmp$ touch sub1/file1
> ghigo@venice:/tmp$ python
> Python 2.7.18 (default, Apr 20 2020, 20:30:41)
> [GCC 9.3.0] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> > > > import os
> > > > os.rename("sub1/file1", "sub2/file")
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> OSError: [Errno 18] Invalid cross-device link
>
> This means that there is an high risk of an incomplete write in case of unplanned shutdown (even tough clone is allowed...)
renameat2() can atomically swap two directories, so it would solve this
specific part of the problem.
>
> >
> > Sometimes when I have a really problematic upgrade I rsync the system
> > to another box, do the upgrade there, and then rsync the system back
> > to the problematic box. As a side-effect it also allows me to do a
> > verification test to make sure the upgrade worked before throwing it
> > onto a production system. The snapshot/rollback thing would be a
> > local version of that.
> >
> > > If an unclean shutdown happens between 1) and 4), two subvolume exists:
> > > the 'main' one and the 'rollback' one (which is the snapshot before the
> > > update). In this case the system at boot time should mount the "rollback"
> > > subvolume instead of the "main" one. Otherwise in case of a "clean" boot, the
> > > "rollback" subvolume doesn't exist and only the "main" one can be
> > > mounted.
> > >
> > > In [1] I discussed a way to implement the steps 1 to 4. (ok, I missed
> > > the point 3) ).
> > >
> > > The part that was missed until now, is an automatic way to mount the rollback
> > > subvolume at boot time when it is present.
> > >
> > > My idea is to allow more 'subvol=' option. In this case BTRFS tries all the
> > > passed subvolumes until the first succeed. So invoking the kernel as:
> > >
> > > linux root=UUID=xxxx rootflags=subvol=rollback,subvol=main ro
> > >
> > > First, the kernel tries to mount the 'rollback' subvolume. If the rollback
> > > subvolume doesn't exist then it mounts the 'main' subvolume.
> >
> > This could be done already from the initramfs.
>
> Ok, this means that we have three possibility:
> 1) do this at bootloder level (eg grub)
> 2) do this at initramfs
> 3) do this at kernel level (see my patch)
>
> All these possibilities are a viable solution. However I find 1) and 2) the more "intrusive", and distro specific. My fear is that each distro will take a different choice, leading to a more fragmentation.
> I hoped that the solution nr 3, could help to find a unique solution....
>
>
>
>
> >
> > > Of course after the mount, the system should perform a cleanup of the
> > > subvolumes: i.e. if a rollback subvolume exists, the system should destroy
> > > the "main" one (which contains garbage) and rename "rollback" to "main".
> > > To be more precise:
> > >
> > > if test -d "rollback"; then
> > > if test -d "old"; then
> > > btrfs sub del "old"
> > > fi
> > > if test -d "main"; then
> > > mv "main" "old"
> > > fi
> > > mv "rollback" "main"
> > > btrfs sub del "old"
> > > fi
> > >
> > > Comments are welcome
> > > BR
> > > G.Baroncelli
> > >
> > > [1] http://lore.kernel.org/linux-btrfs/69396573-b5b3-b349-06f5-f5b74eb9720d@libero.it/
> > >
> > > P.S.
> > > I am guessing if an idea like this can be applied to a file. E.g. a sqlite
> > > database that instead of reling to sync/fsync, creates a reflink file as
> > > "rollback" if something goes wrong.... The ordering is preserved. Not the
> > > duration.
> > >
>
>
> --
> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
next prev parent reply other threads:[~2020-07-25 2:44 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-21 20:33 [RFC] btrfs: strategy to perform a rollback at boot time Goffredo Baroncelli
2020-07-21 20:33 ` [PATCH] btrfs: allow more subvol= option Goffredo Baroncelli
2020-07-21 20:50 ` Steven Davies
2020-07-22 1:12 ` kernel test robot
2020-07-21 20:55 ` [RFC] btrfs: strategy to perform a rollback at boot time Steven Davies
2020-07-23 19:52 ` Goffredo Baroncelli
2020-07-21 21:09 ` Chris Murphy
2020-07-22 0:21 ` Nicholas D Steeves
2020-07-23 20:02 ` Goffredo Baroncelli
2020-07-23 21:53 ` Zygo Blaxell
2020-07-24 11:56 ` Goffredo Baroncelli
2020-07-24 22:08 ` Chris Murphy
2020-07-25 2:37 ` Zygo Blaxell [this message]
2020-07-27 12:26 ` David Sterba
2020-07-27 17:25 ` Goffredo Baroncelli
2020-07-27 17:34 ` Goffredo Baroncelli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200725023752.GF5890@hungrycats.org \
--to=ce3g8jdj@umail.furryterror.org \
--cc=kreijack@inwind.it \
--cc=linux-btrfs@vger.kernel.org \
--cc=lists@colorremedies.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox