public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* Suggestion: Anti-fragmentation safety catch (RFC)
@ 2014-03-24 19:47 Martin
  2014-03-24 20:19 ` Duncan
  0 siblings, 1 reply; 4+ messages in thread
From: Martin @ 2014-03-24 19:47 UTC (permalink / raw)
  To: linux-btrfs

Just an idea:


btrfs Problem:

I've had two systems die with huge load factors >100(!) for the case
where a user program has unexpected to me been doing 'database'-like
operations and caused multiple files to become heavily fragmented. The
system eventually dies when data cannot be added to the fragmented files
faster than the real time data collection.

My example case is for two systems with btrfs raid1 using two HDDs each.
Normal write speed is about 100MByte/s. After heavy fragmentation, the
cpus are at 100% wait and i/o is a few hundred kByte/s.


Possible fix:

btrfs checks the ratio of filesize versus number of fragments and for a
bad ratio either:

1: Performs a non-cow copy to defragment the file;

2: Turns off cow for that file and gives a syslog warning for that;

3: Automatically defragments the file.



Or?


For my case, I'm not sure "2" is a good idea in case the user is
rattling through a gazillion files and the syslog gets swamped.

Unfortunately, I don't know beforehand what files to mark no-cow unless
I no-cow the entire user/applications.


Thoughts?


Thanks,
Martin


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Suggestion: Anti-fragmentation safety catch (RFC)
  2014-03-24 19:47 Suggestion: Anti-fragmentation safety catch (RFC) Martin
@ 2014-03-24 20:19 ` Duncan
  2014-03-25  0:57   ` Martin
  0 siblings, 1 reply; 4+ messages in thread
From: Duncan @ 2014-03-24 20:19 UTC (permalink / raw)
  To: linux-btrfs

Martin posted on Mon, 24 Mar 2014 19:47:34 +0000 as excerpted:

> Possible fix:
> 
> btrfs checks the ratio of filesize versus number of fragments and for a
> bad ratio either: [...]

> 3: Automatically defragments the file.

See the autodefrag mount option.

=:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Suggestion: Anti-fragmentation safety catch (RFC)
  2014-03-24 20:19 ` Duncan
@ 2014-03-25  0:57   ` Martin
  2014-03-25 15:42     ` Duncan
  0 siblings, 1 reply; 4+ messages in thread
From: Martin @ 2014-03-25  0:57 UTC (permalink / raw)
  To: linux-btrfs

On 24/03/14 20:19, Duncan wrote:
> Martin posted on Mon, 24 Mar 2014 19:47:34 +0000 as excerpted:
> 
>> Possible fix:
>>
>> btrfs checks the ratio of filesize versus number of fragments and for a
>> bad ratio either: [...]
> 
>> 3: Automatically defragments the file.
> 
> See the autodefrag mount option.
> 
> =:^)

Thanks for that!

So...

https://btrfs.wiki.kernel.org/index.php/Mount_options
####
autodefrag (since [kernel] 3.0)

Will detect random writes into existing files and kick off background
defragging. It is well suited to bdb or sqlite databases, but not
virtualization images or big databases (yet). Once the developers make
sure it doesn't defrag files over and over again, they'll move this
toward the default.
####

Looks like I might be a good test case :-)


What's the problem for big images or big databases? What is considered
"big"?

Thanks,
Martin


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Suggestion: Anti-fragmentation safety catch (RFC)
  2014-03-25  0:57   ` Martin
@ 2014-03-25 15:42     ` Duncan
  0 siblings, 0 replies; 4+ messages in thread
From: Duncan @ 2014-03-25 15:42 UTC (permalink / raw)
  To: linux-btrfs

Martin posted on Tue, 25 Mar 2014 00:57:05 +0000 as excerpted:

> https://btrfs.wiki.kernel.org/index.php/Mount_options

> #### autodefrag (since [kernel] 3.0)
> 
> Will detect random writes into existing files and kick off background
> defragging. It is well suited to bdb or sqlite databases, but not
> virtualization images or big databases (yet). Once the developers make
> sure it doesn't defrag files over and over again, they'll move this
> toward the default.
> ####
> 
> Looks like I might be a good test case :-)
> 
> 
> What's the problem for big images or big databases? What is considered
> "big"?

"Big" is obviously relative and may depend to some extent on the physical 
device backing the filesystem, particularly SSD vs. spinning rust, as 
well as just how actively rewritten the file in question actually is.

Based on my own experience and what I've seen posted from others, 
autodefrag seems to work reasonably well into the lower hundreds of MiB, 
while once we're talking "gigs", something like the NOCOW file attribute 
tends to be a better solution.

Sizes of say half a gig to a gig are a gray area.  Autodefrag will 
probably work well enough on them for fast media (SSD) or if the file re-
writing requests aren't coming in /too/ fast, but on slower spinning rust 
or where internal file data rewrites are coming fast, rewriting the 
entire multi-hundred-megabyte file to defrag it every time an update of a 
few bytes comes in will likely bottleneck the system, with an effect much 
like the one you posted to start this thread: a load average increasing 
into the hundreds due to IO-bottleneck with CPUs @ 100% wait, due to the 
write-magnification effect as a full several hundred megabyte file gets 
repeatedly rewritten for each update of a few bytes!

Actually, if your use-case ends up being in or near that gray area, I'm 
sure some specific tests and hard numbers would be appreciated!  Maybe 
autodefrag is fine to 1.5 GiB or so, or perhaps the trouble starts at say 
300 MiB for you as your system is slow enough and the incoming data 
stream high enough you're bottlenecking at 300 MiB.  Or perhaps the half-
gig to 1-gig range is right on.  Regardless, if you can get hard data on 
it, please do share. =:^)

Meanwhile, the NOCOW extended file-attribute (chattr +C) mentioned a 
couple paragraphs up is recommended once the problem scales beyond what 
autodefrag can handle. There are, however, a number of btrfs specific 
peculiarities to the NOCOW situation that it can take some familiarity 
with the topic to cleanly navigate.  That's out of scope for this post 
and besides, there's quite a few other threads where it has been 
discussed, so I'll punt on that discussion, for now.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-03-25 15:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-24 19:47 Suggestion: Anti-fragmentation safety catch (RFC) Martin
2014-03-24 20:19 ` Duncan
2014-03-25  0:57   ` Martin
2014-03-25 15:42     ` Duncan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox