* Suggestion: Anti-fragmentation safety catch (RFC) @ 2014-03-24 19:47 Martin 2014-03-24 20:19 ` Duncan 0 siblings, 1 reply; 4+ messages in thread From: Martin @ 2014-03-24 19:47 UTC (permalink / raw) To: linux-btrfs Just an idea: btrfs Problem: I've had two systems die with huge load factors >100(!) for the case where a user program has unexpected to me been doing 'database'-like operations and caused multiple files to become heavily fragmented. The system eventually dies when data cannot be added to the fragmented files faster than the real time data collection. My example case is for two systems with btrfs raid1 using two HDDs each. Normal write speed is about 100MByte/s. After heavy fragmentation, the cpus are at 100% wait and i/o is a few hundred kByte/s. Possible fix: btrfs checks the ratio of filesize versus number of fragments and for a bad ratio either: 1: Performs a non-cow copy to defragment the file; 2: Turns off cow for that file and gives a syslog warning for that; 3: Automatically defragments the file. Or? For my case, I'm not sure "2" is a good idea in case the user is rattling through a gazillion files and the syslog gets swamped. Unfortunately, I don't know beforehand what files to mark no-cow unless I no-cow the entire user/applications. Thoughts? Thanks, Martin ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Suggestion: Anti-fragmentation safety catch (RFC) 2014-03-24 19:47 Suggestion: Anti-fragmentation safety catch (RFC) Martin @ 2014-03-24 20:19 ` Duncan 2014-03-25 0:57 ` Martin 0 siblings, 1 reply; 4+ messages in thread From: Duncan @ 2014-03-24 20:19 UTC (permalink / raw) To: linux-btrfs Martin posted on Mon, 24 Mar 2014 19:47:34 +0000 as excerpted: > Possible fix: > > btrfs checks the ratio of filesize versus number of fragments and for a > bad ratio either: [...] > 3: Automatically defragments the file. See the autodefrag mount option. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Suggestion: Anti-fragmentation safety catch (RFC) 2014-03-24 20:19 ` Duncan @ 2014-03-25 0:57 ` Martin 2014-03-25 15:42 ` Duncan 0 siblings, 1 reply; 4+ messages in thread From: Martin @ 2014-03-25 0:57 UTC (permalink / raw) To: linux-btrfs On 24/03/14 20:19, Duncan wrote: > Martin posted on Mon, 24 Mar 2014 19:47:34 +0000 as excerpted: > >> Possible fix: >> >> btrfs checks the ratio of filesize versus number of fragments and for a >> bad ratio either: [...] > >> 3: Automatically defragments the file. > > See the autodefrag mount option. > > =:^) Thanks for that! So... https://btrfs.wiki.kernel.org/index.php/Mount_options #### autodefrag (since [kernel] 3.0) Will detect random writes into existing files and kick off background defragging. It is well suited to bdb or sqlite databases, but not virtualization images or big databases (yet). Once the developers make sure it doesn't defrag files over and over again, they'll move this toward the default. #### Looks like I might be a good test case :-) What's the problem for big images or big databases? What is considered "big"? Thanks, Martin ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Suggestion: Anti-fragmentation safety catch (RFC) 2014-03-25 0:57 ` Martin @ 2014-03-25 15:42 ` Duncan 0 siblings, 0 replies; 4+ messages in thread From: Duncan @ 2014-03-25 15:42 UTC (permalink / raw) To: linux-btrfs Martin posted on Tue, 25 Mar 2014 00:57:05 +0000 as excerpted: > https://btrfs.wiki.kernel.org/index.php/Mount_options > #### autodefrag (since [kernel] 3.0) > > Will detect random writes into existing files and kick off background > defragging. It is well suited to bdb or sqlite databases, but not > virtualization images or big databases (yet). Once the developers make > sure it doesn't defrag files over and over again, they'll move this > toward the default. > #### > > Looks like I might be a good test case :-) > > > What's the problem for big images or big databases? What is considered > "big"? "Big" is obviously relative and may depend to some extent on the physical device backing the filesystem, particularly SSD vs. spinning rust, as well as just how actively rewritten the file in question actually is. Based on my own experience and what I've seen posted from others, autodefrag seems to work reasonably well into the lower hundreds of MiB, while once we're talking "gigs", something like the NOCOW file attribute tends to be a better solution. Sizes of say half a gig to a gig are a gray area. Autodefrag will probably work well enough on them for fast media (SSD) or if the file re- writing requests aren't coming in /too/ fast, but on slower spinning rust or where internal file data rewrites are coming fast, rewriting the entire multi-hundred-megabyte file to defrag it every time an update of a few bytes comes in will likely bottleneck the system, with an effect much like the one you posted to start this thread: a load average increasing into the hundreds due to IO-bottleneck with CPUs @ 100% wait, due to the write-magnification effect as a full several hundred megabyte file gets repeatedly rewritten for each update of a few bytes! Actually, if your use-case ends up being in or near that gray area, I'm sure some specific tests and hard numbers would be appreciated! Maybe autodefrag is fine to 1.5 GiB or so, or perhaps the trouble starts at say 300 MiB for you as your system is slow enough and the incoming data stream high enough you're bottlenecking at 300 MiB. Or perhaps the half- gig to 1-gig range is right on. Regardless, if you can get hard data on it, please do share. =:^) Meanwhile, the NOCOW extended file-attribute (chattr +C) mentioned a couple paragraphs up is recommended once the problem scales beyond what autodefrag can handle. There are, however, a number of btrfs specific peculiarities to the NOCOW situation that it can take some familiarity with the topic to cleanly navigate. That's out of scope for this post and besides, there's quite a few other threads where it has been discussed, so I'll punt on that discussion, for now. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-03-25 15:43 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-03-24 19:47 Suggestion: Anti-fragmentation safety catch (RFC) Martin 2014-03-24 20:19 ` Duncan 2014-03-25 0:57 ` Martin 2014-03-25 15:42 ` Duncan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox