Re: 5 _thousand_ snapshots? even 160? (was: device balance times)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Zygo Blaxell <zblaxell@furryterror.org>
To: Duncan <1i5t5.duncan@cox.net>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: 5 _thousand_ snapshots? even 160? (was: device balance times)
Date: Wed, 22 Oct 2014 16:08:12 -0400	[thread overview]
Message-ID: <20141022200812.GA17395@hungrycats.org> (raw)
In-Reply-To: <pan$66073$b8603a2$c915bfb9$cd573e99@cox.net>

[-- Attachment #1: Type: text/plain, Size: 2597 bytes --]

On Wed, Oct 22, 2014 at 07:41:32AM +0000, Duncan wrote:
> Tomasz Chmielewski posted on Wed, 22 Oct 2014 09:14:14 +0200 as excerpted:
> >> Tho that is of course per subvolume.  If you have multiple subvolumes
> >> on the same filesystem, that can still end up being a thousand or two
> >> snapshots per filesystem.  But those are all groups of something under
> >> 300 (under 100 with hourly) highly connected to each other, with the
> >> interweaving inside each of those groups being the real complexity in
> >> terms of btrfs management.
> 
> IOW, if you thin down the snapshots per subvolume to something reasonable 
> (under 300 for sure, preferably under 100), then depending on the number 
> of subvolumes you're snapshotting, you might have a thousand or two.  
> However, of those couple thousand, btrfs will only have to deal with the 
> under 300 and preferably well under a hundred in the same group, that are 
> snapshots of the same thing and thus related to each other, at any given 
> time.  The other snapshots will be there but won't be adding to the 
> complexity near as much since they're of different subvolumes and aren't 
> logically interwoven together with the ones being considered at that 
> moment.
> 
> But even then, at say 250 snapshots per subvolume, 2000 snapshots is 8 
> independent subvolumes.  That could happen.  But 5000 snapshots?  That'd 
> be 20 independent subvolumes, which is heading toward the extreme again.  
> Yes it could happen, but better if it does to cut down on the per-
> subvolume snapshots further, to say the 25 per subvolume I mentioned, or 
> perhaps even further.  25 snapshots per subvolume with those same 20 
> subvolumes... 500 snapshots total instead of 5000. =:^)

If you have one subvolume per user and 1000 user directories on a server,
it's only 5 snapshots per user (last hour, last day, last week, last
month, and last year).  I hear this is a normal use case in the ZFS world.
It would certainly be attractive if there was working quota support.

I have datasets where I record 14000+ snapshots of filesystem directory
trees scraped from test machines and aggregated onto a single server
for deduplication...but I store each snapshot as a git commit, not as
a btrfs snapshot or even subvolume.

We do sometimes run queries like "in the last two years, how many times
did $CONDITION occur?" which will scan a handful files in all of the
snapshots.  The use case itself isn't unreasonable, although using the
filesystem instead of a more domain-specific tool to achieve it may be.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

next prev parent reply	other threads:[~2014-10-22 20:08 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-22  7:14 5 _thousand_ snapshots? even 160? (was: device balance times) Tomasz Chmielewski
2014-10-22  7:41 ` Duncan
2014-10-22 20:08   ` Zygo Blaxell [this message]
2014-10-22 20:37     ` Robert White
2014-10-23  3:09       ` Zygo Blaxell
2014-10-23  4:30     ` Chris Murphy
2014-10-23  5:18       ` Robert White
2014-10-23  8:38         ` Duncan
2014-10-23 13:15         ` Zygo Blaxell
  -- strict thread matches above, loose matches on Subject: below --
2014-10-21 18:59 device balance times Tomasz Chmielewski
2014-10-21 20:14 ` Piotr Pawłow
2014-10-21 20:44   ` Arnaud Kapp
2014-10-22  1:10     ` 5 _thousand_ snapshots? even 160? (was: device balance times) Robert White
2014-10-22  4:02       ` Zygo Blaxell
2014-10-22  4:05       ` Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141022200812.GA17395@hungrycats.org \
    --to=zblaxell@furryterror.org \
    --cc=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.