From: ST <smntov@gmail.com>
To: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Several questions regarding btrfs
Date: Wed, 01 Nov 2017 16:05:53 +0200 [thread overview]
Message-ID: <1509545153.1662.105.camel@gmail.com> (raw)
In-Reply-To: <ea097624-d485-9423-387f-3c9427508883@gmail.com>
> >>> 3. in my current ext4-based setup I have two servers while one syncs
> >>> files of certain dir to the other using lsyncd (which launches rsync on
> >>> inotify events). As far as I have understood it is more efficient to use
> >>> btrfs send/receive (over ssh) than rsync (over ssh) to sync two boxes.
> >>> Do you think it would be possible to make lsyncd to use btrfs for
> >>> syncing instead of rsync? I.e. can btrfs work with inotify events? Did
> >>> somebody try it already?
> >> BTRFS send/receive needs a read-only snapshot to send from. This means
> >> that triggering it on inotify events is liable to cause performance
> >> issues and possibly lose changes
> >
> > Actually triggering doesn't happen on each and every inotify event.
> > lsyncd has an option to define a time interval within which all inotify
> > events are accumulated and only then rsync is launched. It could be 5-10
> > seconds or more. Which is quasi real time sync. Do you still hold that
> > it will not work with BTRFS send/receive (i.e. keeping previous snapshot
> > around and creating a new one)?
> Okay, I actually didn't know that. Depending on how lsyncd invokes
> rsync though (does it call out rsync with the exact paths or just on the
> whole directory?), it may still be less efficient to use BTRFS send/receive.
I assume on the whole directory, but I'm not sure...
> >>> 4. In a case when compression is used - what quota is based on - (a)
> >>> amount of GBs the data actually consumes on the hard drive while in
> >>> compressed state or (b) amount of GBs the data naturally is in
> >>> uncompressed form. I need to set quotas as in (b). Is it possible? If
> >>> not - should I file a feature request?
> >> I can't directly answer this as I don't know myself (I don't use
> >> quotas), but have two comments I would suggest you consider:
> >>
> >> 1. qgroups (the BTRFS quota implementation) cause scaling and
> >> performance issues. Unless you absolutely need quotas (unless you're a
> >> hosting company, or are dealing with users who don't listen and don't
> >> pay attention to disk usage, you usually do not need quotas), you're
> >> almost certainly better off disabling them for now, especially for a
> >> production system.
> >
> > Ok. I'll use more standard approaches. Which of following commands will
> > work with BTRFS:
> >
> > https://debian-handbook.info/browse/stable/sect.quotas.html
> None, qgroups are the only option right now with BTRFS, and it's pretty
> likely to stay that way since the internals of the filesystem don't fit
> well within the semantics of the regular VFS quota API. However,
> provided you're not using huge numbers of reflinks and subvolumes, you
> should be fine using qgroups.
I want to have 7 daily (or 7+4) read-only snapshots per user, for ca.
100 users. I don't expect users to invoke cp --reflink or take
snapshots.
>
> However, it's important to know that if your users have shell access,
> they can bypass qgroups. Normal users can create subvolumes, and new
> subvolumes aren't added to an existing qgroup by default (and unless I'm
> mistaken, aren't constrained by the qgroup set on the parent subvolume),
> so simple shell access is enough to bypass quotas.
I never did it before, but shouldn't it be possible to just whitelist
commands users are allowed to use in the SSH config (and so block
creation of subvolumes/cp --reflink)? I actually would have restricted
users to sftp if I knew how to let them change their passwords once they
wish to. As far as I know it is not possible with OpenSSH...
> >>
> >> 2. Compression and quotas cause issues regardless of how they interact.
> >> In case (a), the user has no way of knowing if a given file will fit
> >> under their quota until they try to create it. In case (b), actual disk
> >> usage (as reported by du) will not match up with what the quota says the
> >> user is using, which makes it harder for them to figure out what to
> >> delete to free up space. It's debatable which is a less objectionable
> >> situation for users, though most people I know tend to think in a way
> >> that the issue with (a) doesn't matter, but the issue with (b) does.
> >
> > I think both (a) and (b) should be possible and it should be up to
> > sysadmin to choose what he prefers. The concerns of the (b) scenario
> > probably could be dealt with some sort of --real-size to the du command,
> > while by default it could have behavior (which might be emphasized with
> > --compressed-size).
> Reporting anything but the compressed size by default in du would mean
> it doesn't behave as existing software expect it to. It's supposed to
> report actual disk usage (in contrast to the sum of file sizes), which
> means for example that a 1G sparse file with only 64k of data is
> supposed to be reported as being 64k by du.
Yes, it shouldn't be default behavior, but an optional one...
> > Two more question came to my mind: as I've mentioned above - I have two
> > boxes one syncs to another. No RAID involved. I want to scrub (or scan -
> > don't know yet, what is the difference...) the whole filesystem once in
> > a month to look for bitrot. Questions:
> >
> > 1. is it a stable setup for production? Let's say I'll sync with rsync -
> > either in cron or in lsyncd?
> Reasonably, though depending on how much data and other environmental
> constraints, you may want to scrub a bit more frequently.
> > 2. should any data corruption be discovered - is there any way to heal
> > it using the copy from the other box over SSH?
> Provided you know which file is affected, yes, you can fix it by just
> copying the file back from the other system.
Ok, but there is no automatic fixing in such a case, right?
next prev parent reply other threads:[~2017-11-01 14:05 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-31 16:23 Several questions regarding btrfs ST
2017-10-31 17:45 ` Austin S. Hemmelgarn
2017-10-31 18:51 ` Andrei Borzenkov
2017-10-31 19:07 ` Austin S. Hemmelgarn
2017-10-31 20:06 ` ST
2017-11-01 12:01 ` Austin S. Hemmelgarn
2017-11-01 14:05 ` ST [this message]
2017-11-01 15:31 ` Lukas Pirl
2017-11-01 17:20 ` Austin S. Hemmelgarn
2017-11-02 9:09 ` ST
2017-11-02 11:01 ` Austin S. Hemmelgarn
2017-11-02 15:59 ` ST
[not found] ` <E7316F3D-708C-4D5E-AB4B-F54B0B8471C1@rqc.ru>
2017-11-02 16:28 ` ST
2017-11-02 17:13 ` Austin S. Hemmelgarn
2017-11-02 17:32 ` Andrei Borzenkov
2017-11-01 17:52 ` Andrei Borzenkov
2017-11-01 18:28 ` Austin S. Hemmelgarn
2017-11-01 12:15 ` Duncan
-- strict thread matches above, loose matches on Subject: below --
2017-10-31 16:29 ST
2017-11-06 21:48 ` waxhead
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1509545153.1662.105.camel@gmail.com \
--to=smntov@gmail.com \
--cc=ahferroin7@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).