* btrfs balance problems @ 2017-12-23 11:19 James Courtier-Dutton 2017-12-23 11:56 ` Alberto Bursi 2017-12-28 11:15 ` Nikolay Borisov 0 siblings, 2 replies; 9+ messages in thread From: James Courtier-Dutton @ 2017-12-23 11:19 UTC (permalink / raw) To: linux-btrfs Hi, During a btrfs balance, the process hogs all CPU. Or, to be exact, any other program that wishes to use the SSD during a btrfs balance is blocked for long periods. Long periods being more than 5 seconds. Is there any way to multiplex SSD access while btrfs balance is operating, so that other applications can still access the SSD with relatively low latency? My guess is that btrfs is doing a transaction with a large number of SSD blocks at a time, and thus blocking other applications. This makes for atrocious user interactivity as well as applications failing because they cannot access the disk in a relatively low latent manner. For, example, this is causing a High Definition network CCTV application to fail. What I would really like, is for some way to limit SSD bandwidths to applications. For example the CCTV app always gets the bandwidth it needs, and all other applications can still access the SSD, but are rate limited. This would fix my particular problem. We have rate limiting for network applications, why not disk access also? Kind Regards James ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: btrfs balance problems 2017-12-23 11:19 btrfs balance problems James Courtier-Dutton @ 2017-12-23 11:56 ` Alberto Bursi [not found] ` <CAAMvbhHV=BvRLv14U0JRrYmhiXeREOTNiVLPkuq=MO6dH4jDiQ@mail.gmail.com> 2017-12-28 11:15 ` Nikolay Borisov 1 sibling, 1 reply; 9+ messages in thread From: Alberto Bursi @ 2017-12-23 11:56 UTC (permalink / raw) To: James Courtier-Dutton, linux-btrfs@vger.kernel.org [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2735 bytes --] On 12/23/2017 12:19 PM, James Courtier-Dutton wrote: > Hi, > > During a btrfs balance, the process hogs all CPU. > Or, to be exact, any other program that wishes to use the SSD during a > btrfs balance is blocked for long periods. Long periods being more > than 5 seconds. > Is there any way to multiplex SSD access while btrfs balance is > operating, so that other applications can still access the SSD with > relatively low latency? > > My guess is that btrfs is doing a transaction with a large number of > SSD blocks at a time, and thus blocking other applications. > > This makes for atrocious user interactivity as well as applications > failing because they cannot access the disk in a relatively low latent > manner. > For, example, this is causing a High Definition network CCTV > application to fail. > > What I would really like, is for some way to limit SSD bandwidths to > applications. > For example the CCTV app always gets the bandwidth it needs, and all > other applications can still access the SSD, but are rate limited. > This would fix my particular problem. > We have rate limiting for network applications, why not disk access also? > > Kind Regards > > James > On most I/O intensive programs in Linux you can use "ionice" tool to change the disk access priority of a process. [1] This allows me to run I/O intensive background scripts in servers without the users noticing slowdowns or lagging, of course this means the process doing heavy I/O will run more slowly or get outright paused if higher-priority processes need a lot of access to the disk. It works on btrfs balance too, see (commandline example) [2]. If you don't start the process with ionice as in [2], you can always change the priority later if you get the get the process ID. I use iotop [3], which also supports commandline arguments to integrate its output in scripts. For btrfs scrub it seems to be possible to specify the ionice options directly, while btrfs balance does not seem to have them (would be nice to add them imho). [4] For the sake of completeness, there is also "nice" tool for CPU usage priority (also used in my scripts on servers to keep the scripts from hogging the CPU for what is just a background process, and seen in [2] commandline too). [5] 1. http://man7.org/linux/man-pages/man1/ionice.1.html 2. https://unix.stackexchange.com/questions/390480/nice-and-ionice-which-one-should-come-first 3. http://man7.org/linux/man-pages/man8/iotop.8.html 4. https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-scrub 5. http://man7.org/linux/man-pages/man1/nice.1.html -Alberto ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±ý»k~ÏâØ^nr¡ö¦zË\x1aëh¨èÚ&£ûàz¿äz¹Þú+Ê+zf£¢·h§~Ûiÿÿïêÿêçz_è®\x0fæj:+v¨þ)ߣøm ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <CAAMvbhHV=BvRLv14U0JRrYmhiXeREOTNiVLPkuq=MO6dH4jDiQ@mail.gmail.com>]
* Re: btrfs balance problems [not found] ` <CAAMvbhHV=BvRLv14U0JRrYmhiXeREOTNiVLPkuq=MO6dH4jDiQ@mail.gmail.com> @ 2017-12-27 21:39 ` James Courtier-Dutton 2017-12-27 21:54 ` waxhead 2017-12-28 0:39 ` Duncan 0 siblings, 2 replies; 9+ messages in thread From: James Courtier-Dutton @ 2017-12-27 21:39 UTC (permalink / raw) To: linux-btrfs Hi, Thank you for your suggestion. It does not help at all. btrfs balance's behaviour seems to be unchanged by ionice. It still takes 100% while working and starves all other processes of disk access. I can I get btrfs balance to work in the background, without adversely affecting other applications? > > On 23 December 2017 at 11:56, Alberto Bursi <alberto.bursi@outlook.it> wrote: >> >> >> On 12/23/2017 12:19 PM, James Courtier-Dutton wrote: >>> Hi, >>> >>> During a btrfs balance, the process hogs all CPU. >>> Or, to be exact, any other program that wishes to use the SSD during a >>> btrfs balance is blocked for long periods. Long periods being more >>> than 5 seconds. >>> Is there any way to multiplex SSD access while btrfs balance is >>> operating, so that other applications can still access the SSD with >>> relatively low latency? >>> >>> My guess is that btrfs is doing a transaction with a large number of >>> SSD blocks at a time, and thus blocking other applications. >>> >>> This makes for atrocious user interactivity as well as applications >>> failing because they cannot access the disk in a relatively low latent >>> manner. >>> For, example, this is causing a High Definition network CCTV >>> application to fail. >>> >>> What I would really like, is for some way to limit SSD bandwidths to >>> applications. >>> For example the CCTV app always gets the bandwidth it needs, and all >>> other applications can still access the SSD, but are rate limited. >>> This would fix my particular problem. >>> We have rate limiting for network applications, why not disk access also? >>> >>> Kind Regards >>> >>> James >>> >> >> On most I/O intensive programs in Linux you can use "ionice" tool to >> change the disk access priority of a process. [1] >> This allows me to run I/O intensive background scripts in servers >> without the users noticing slowdowns or lagging, of course this means >> the process doing heavy I/O will run more slowly or get outright paused >> if higher-priority processes need a lot of access to the disk. >> >> It works on btrfs balance too, see (commandline example) [2]. >> >> If you don't start the process with ionice as in [2], you can always >> change the priority later if you get the get the process ID. I use iotop >> [3], which also supports commandline arguments to integrate its output >> in scripts. >> >> For btrfs scrub it seems to be possible to specify the ionice options >> directly, while btrfs balance does not seem to have them (would be nice >> to add them imho). [4] >> >> For the sake of completeness, there is also "nice" tool for CPU usage >> priority (also used in my scripts on servers to keep the scripts from >> hogging the CPU for what is just a background process, and seen in [2] >> commandline too). [5] >> >> 1. http://man7.org/linux/man-pages/man1/ionice.1.html >> 2. >> https://unix.stackexchange.com/questions/390480/nice-and-ionice-which-one-should-come-first >> 3. http://man7.org/linux/man-pages/man8/iotop.8.html >> 4. https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-scrub >> 5. http://man7.org/linux/man-pages/man1/nice.1.html >> >> -Alberto ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: btrfs balance problems 2017-12-27 21:39 ` James Courtier-Dutton @ 2017-12-27 21:54 ` waxhead 2017-12-28 0:39 ` Duncan 1 sibling, 0 replies; 9+ messages in thread From: waxhead @ 2017-12-27 21:54 UTC (permalink / raw) To: James Courtier-Dutton, linux-btrfs James Courtier-Dutton wrote: > Hi, > > Thank you for your suggestion. > It does not help at all. > btrfs balance's behaviour seems to be unchanged by ionice. > It still takes 100% while working and starves all other processes of > disk access. > > I can I get btrfs balance to work in the background, without adversely > affecting other applications? > Are you using block multiqueue or perhaps a scheduler other than CFQ?! I may be wrong on this , but from memory I think the only scheduler that plays nicely (from a practical point of view) with ionice is CFQ check the output of: cat /sys/block/sda/queue/scheduler Of course you need to replace sda with the appropriate block device and the selected scheduler will be shown in [brackets]. If that does not help you can maybe reduce the time your system is slowed down by using the usage balance filter and increasing the value bit by bit.... https://btrfs.wiki.kernel.org/index.php/Balance_Filters >> >> On 23 December 2017 at 11:56, Alberto Bursi <alberto.bursi@outlook.it> wrote: >>> >>> >>> On 12/23/2017 12:19 PM, James Courtier-Dutton wrote: >>>> Hi, >>>> >>>> During a btrfs balance, the process hogs all CPU. >>>> Or, to be exact, any other program that wishes to use the SSD during a >>>> btrfs balance is blocked for long periods. Long periods being more >>>> than 5 seconds. >>>> Is there any way to multiplex SSD access while btrfs balance is >>>> operating, so that other applications can still access the SSD with >>>> relatively low latency? >>>> >>>> My guess is that btrfs is doing a transaction with a large number of >>>> SSD blocks at a time, and thus blocking other applications. >>>> >>>> This makes for atrocious user interactivity as well as applications >>>> failing because they cannot access the disk in a relatively low latent >>>> manner. >>>> For, example, this is causing a High Definition network CCTV >>>> application to fail. >>>> >>>> What I would really like, is for some way to limit SSD bandwidths to >>>> applications. >>>> For example the CCTV app always gets the bandwidth it needs, and all >>>> other applications can still access the SSD, but are rate limited. >>>> This would fix my particular problem. >>>> We have rate limiting for network applications, why not disk access also? >>>> >>>> Kind Regards >>>> >>>> James >>>> >>> >>> On most I/O intensive programs in Linux you can use "ionice" tool to >>> change the disk access priority of a process. [1] >>> This allows me to run I/O intensive background scripts in servers >>> without the users noticing slowdowns or lagging, of course this means >>> the process doing heavy I/O will run more slowly or get outright paused >>> if higher-priority processes need a lot of access to the disk. >>> >>> It works on btrfs balance too, see (commandline example) [2]. >>> >>> If you don't start the process with ionice as in [2], you can always >>> change the priority later if you get the get the process ID. I use iotop >>> [3], which also supports commandline arguments to integrate its output >>> in scripts. >>> >>> For btrfs scrub it seems to be possible to specify the ionice options >>> directly, while btrfs balance does not seem to have them (would be nice >>> to add them imho). [4] >>> >>> For the sake of completeness, there is also "nice" tool for CPU usage >>> priority (also used in my scripts on servers to keep the scripts from >>> hogging the CPU for what is just a background process, and seen in [2] >>> commandline too). [5] >>> >>> 1. http://man7.org/linux/man-pages/man1/ionice.1.html >>> 2. >>> https://unix.stackexchange.com/questions/390480/nice-and-ionice-which-one-should-come-first >>> 3. http://man7.org/linux/man-pages/man8/iotop.8.html >>> 4. https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-scrub >>> 5. http://man7.org/linux/man-pages/man1/nice.1.html >>> >>> -Alberto > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: btrfs balance problems 2017-12-27 21:39 ` James Courtier-Dutton 2017-12-27 21:54 ` waxhead @ 2017-12-28 0:39 ` Duncan 2017-12-30 0:34 ` Kai Krakow 2018-01-06 18:09 ` James Courtier-Dutton 1 sibling, 2 replies; 9+ messages in thread From: Duncan @ 2017-12-28 0:39 UTC (permalink / raw) To: linux-btrfs James Courtier-Dutton posted on Wed, 27 Dec 2017 21:39:30 +0000 as excerpted: > Thank you for your suggestion. Please put your reply in standard list quote/reply-in-context order. It makes further replies, /in/ /context/, far easier. I've moved the rest of your reply to do that, but I shouldn't have to... >> On 23 December 2017 at 11:56, Alberto Bursi <alberto.bursi@outlook.it> >> wrote: >>> >>> On 12/23/2017 12:19 PM, James Courtier-Dutton wrote: >>>> >>>> During a btrfs balance, the process hogs all CPU. >>>> Or, to be exact, any other program that wishes to use the SSD during >>>> a btrfs balance is blocked for long periods. Long periods being more >>>> than 5 seconds. Blocking disk access isn't hogging the CPU, it's hogging the disk IO. Tho FWIW we don't have many complaints about btrfs hogging /ssd/ access[1], tho we do have some complaining about problems on legacy spinning-rust. >>>> Is there any way to multiplex SSD access while btrfs balance is >>>> operating, so that other applications can still access the SSD with >>>> relatively low latency? >>>> >>>> My guess is that btrfs is doing a transaction with a large number of >>>> SSD blocks at a time, and thus blocking other applications. >>>> >>>> This makes for atrocious user interactivity as well as applications >>>> failing because they cannot access the disk in a relatively low >>>> latent manner. >>>> For, example, this is causing a High Definition network CCTV >>>> application to fail. That sort of low-latency is outside my own use-case, but I do have some suggestions... >>>> What I would really like, is for some way to limit SSD bandwidths to >>>> applications. >>>> For example the CCTV app always gets the bandwidth it needs, and all >>>> other applications can still access the SSD, but are rate limited. >>>> This would fix my particular problem. >>>> We have rate limiting for network applications, why not disk access >>>> also? >>>> >>> On most I/O intensive programs in Linux you can use "ionice" tool to >>> change the disk access priority of a process. [1] AFAIK, ionice only works for some IO schedulers, not all. It does work with the default CFQ scheduler, but I don't /believe/ it works with deadline, certainly not with noop, and I'd /guess/ it doesn't work with block-multiqueue (and thus not with bfq or kyber) at all, tho it's possible it does in the latest kernels, since multi-queue is targeted to eventually replace, at least as default, the older single-queue options. So which scheduler are you using and are you on multi-queue or not? Meanwhile, where ionice /does/ work, using normal nice 19 should place the process in low-priority batch mode, which should automatically lower the ionice priority (increasing nice), as well. That's what I normally use for such things here, on gentoo, where I schedule my package builds at nice 19, tho I also do the actual builds on tmpfs, so they don't actually touch anything but memory for the build itself, only fetching the sources, storing the built binpkg, and installing it to the main system. >>> This allows me to run I/O intensive background scripts in servers >>> without the users noticing slowdowns or lagging, of course this means >>> the process doing heavy I/O will run more slowly or get outright >>> paused if higher-priority processes need a lot of access to the disk. >>> >>> It works on btrfs balance too, see (commandline example) [2]. There's a problem with that example. See below. >>> If you don't start the process with ionice as in [2], you can always >>> change the priority later if you get the get the process ID. I use >>> iotop [3], which also supports commandline arguments to integrate its >>> output in scripts. >>> >>> For btrfs scrub it seems to be possible to specify the ionice options >>> directly, while btrfs balance does not seem to have them (would be >>> nice to add them imho). [4] >>> >>> For the sake of completeness, there is also "nice" tool for CPU usage >>> priority (also used in my scripts on servers to keep the scripts from >>> hogging the CPU for what is just a background process, and seen in [2] >>> commandline too). [5] >>> >>> 1. http://man7.org/linux/man-pages/man1/ionice.1.html >>> 2. https://unix.stackexchange.com/questions/390480/nice-and-ionice- which-one-should-come-first >>> 3. http://man7.org/linux/man-pages/man8/iotop.8.html >>> 4. https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-scrub >>> 5. http://man7.org/linux/man-pages/man1/nice.1.html > It does not help at all. > btrfs balance's behaviour seems to be unchanged by ionice. > It still takes 100% while working and starves all other processes of > disk access. 100% CPU, or 100% IO? How are you measuring? If iotop, 99% of time waiting on IO for an IO-bound process isn't bad, and doesn't mean nothing else can do IO first (tho 99% for that CCTV process /could/ be a problem, if it's normally much lower and only 99% because btrfs is taking what it needs). 100% of a CPU on a multicore isn't as big a deal as it used to be on a single-core, either, not to mention that 100% of a cpu throttled down to under half-speed is 50% or under at full-speed. And if it's CPU, what state? Mostly in wait state indicates it's waiting for IO, rather different than 100% system, user, or niced (plus there's steal and guest in the virtual context, out of my own use-case so I don't know so much about it). And near 100% niced shouldn't be a problem since other processes will come first. Meanwhile, the problem mentioned above is that it's not terribly surprising that it doesn't help a lot, since for commands such as btrfs balance, defrag and scrub, the btrfs userspace mostly just sets up the kernel to do the real job, so throttling the userspace only won't tend to do what you want. Luckily, scrub has an option to use ionice builtin, so you don't have to worry about it there, but balance is a different story... > I can I get btrfs balance to work in the background, without adversely > affecting other applications? I'd actually suggest a different strategy. What I did here way back when I was still on reiserfs on spinning rust, where it made more difference than on ssd, but I kept the settings when I switched to ssd and btrfs, and at least some others have mentioned that similar settings helped them on btrfs as well, is... Problem: The kernel virtual-memory subsystem's writeback cache was originally configured for systems with well under a Gigabyte of RAM, and the defaults no longer work so well on multi-GiB-RAM systems, particularly above 8 GiB RAM, because they are based on a percentage of available RAM, and will typically let several GiB of dirty writeback cache accumulate before kicking off any attempt to actually write it to storage. On spinning rust, when writeback /does/ finally kickoff, this can result in hogging the IO for well over half a minute at a time, where 30 seconds also happens to be the default "flush it anyway" time. On ssd, the problem isn't typically as bad, but it could still be well over 5 seconds worth, particularly if you're running 32 GiB+ RAM as large servers often do. Solution: Adjust the kernel's dirty writeback settings, located in /proc/sys/kernel/vm/, as appropriate. Start with reading the kernel documentation's... $KERNELDIR/Documentation/sysctl/vm.txt Focus on the dirty_* files. If you wish, google some of the files for other articles on the subject. Then experiment a bit, first by writing the settings directly into the proc files. When you get settings that work well for you, use your distro's sysctl configuration, typically writing the settings to /etc/sysctl.conf or to files in /etc/sysctl.d/, to make the settings permanent, so they're applied automatically at every boot. FWIW, here's what I use in my /etc/sysctl.conf, 16 GiB desktop/ workstation system. As I said, I originally set this up for spinning rust, but it doesn't hurt for ssd either. # write-cache, foreground/background flushing # vm.dirty_ratio = 10 (% of RAM) # make it 3% of 16G ~ half a gig vm.dirty_ratio = 3 # vm.dirty_bytes = 0 # vm.dirty_background_ratio = 5 (% of RAM) # make it 1% of 16G ~ 160 M vm.dirty_background_ratio = 1 # vm.dirty_background_bytes = 0 # vm.dirty_expire_centisecs = 2999 (30 sec) # vm.dirty_writeback_centisecs = 499 (5 sec) # make it 10 sec vm.dirty_writeback_centisecs = 1000 As you can see I'm already at 1% for vm.dirty_background_ratio. That works reasonably for a 16 GiB RAM system, where it's ~160 MiB. Were I to have more memory, say 32+ GiB, or want to set it less to say 128 MiB or less, I'd need to switch to using the _bytes parameter instead of ratio, to go under 1% and be more precise. Adjusting those down from their 10% foreground, 5% background defaults, over a gig and a half foreground at 16 GiB and over 6 gigs at 64 GiB, will likely help quite a bit right there... if it's IO, anyway. (The default 30 seconds centiseconds time isn't so bad, but while I was there I decided a 10 seconds time was better for me, and I've not had problems with it, so...) Tho there is a newer solution that in theory could potentially eliminate the need for the above, block-multiqueue IO and the kyber (for fast SSD) and bfq (for slower spinning rust and thumbdrives, etc) io-schedulers. They're eventually supposed to supplant the older single/serial-queue alternatives. But there's a reason they're not the defaults yet, as they're still new, still somewhat experimental and potentially buggy, as well as not yet being as fully featured as the serial/single-queue defaults. Of course you may wish to try them too. Actually, I'm trying kyber here, and haven't seen anything major, tho when I do my mkfs.btrfs and fresh full backup routine, it /may/ be slightly slower, but not enough for me to bother actually benchmarking both ways to see for sure, and if it's slower because it's allowing other things in to do their thing too, that might actually be better. Meanwhile, switching to btrfs specific, now... These may or may not apply to your use-case. If they do... Be aware that certain btrfs features can be convenient, but the come at a cost. In particular, both quotas and snapshotting (and dedup) seriously increase btrfs' scaling issues when running commands such as balance and check. The running recommendation is to turn off btrfs quotas if you don't actually need them, as for people who don't, they're simply more trouble than they're worth. (And until relatively recent kernels, btrfs quotas were buggy and not particularly reliable as well, tho they're better in that regard since 4.10 or so... unfortunately I'm not sure if the fixes hit 4.9-LTS or not.) If you need quotas, then at least be aware that turning them off temporarily while doing balance can make a *BIG* difference in processing time -- for some people the difference is big enough that it turns a "just forget about balance, it won't complete in a practical amount of time anyway" job into "balance is actually practical now." This is because quotas repeatedly recalculate as balance shifts the block groups around, and turning them off even temporarily allows balance to do its thing without those repeated recalculations getting in the way. ** Important missing info: Because my own use-case doesn't need quotas I've never used them myself, and don't know if you need to quota rescan when turning them back on or not. Perhaps someone who uses them can fill in that info, and I'll have it the next time. The problem with both snapshotting and dedup is reflinks. Reflinks increase the amount of work btrfs must do to maintain them when moving blockgroups around, thus increasing scaling issues. While generally speaking a handful of snapshots per subvolume won't hurt, once it gets into the hundreds, balance takes *MUCH* longer. Thus, try to keep snapshots per subvolume under 500 at all costs... if you plan on running balance or check, anyway[2]... and under triple-digits if possible. A scheduled snapshot thinning program to match the scheduled snapshotting program many people (and distros) use goes a long way, here. If you can do it, 50 snapshots per subvolume should be fine, with minimal scaling issues. Dedup has the same reflinking issues, but is harder to quantify, because people using it often have many more reflinks but to far fewer files (or more literally, extents) than is typical of snapshots. I'm not aware of any specific recommendations there, other than simply to take the issue into consideration when setting up your dedup. Between my use-case not using quotas/snapshotting/subvolumes as I prefer multiple independent btrfs and full backups, and the above dirty- writeback sysctl settings, plus being on relatively fast (tho still SATA, not the fancy direct-PCIE stuff), as I said, no complaints about btrfs hogging system CPU /or/ IO, here. Tho as I also mentioned, the one thing I do regularly that /might/ tie things up, building package updates on gentoo, I have optimized as well, nice 19ed for idle/batch priority (which automatically ionices it as well), and doing the actual build in tmpfs, so it doesn't hit main storage except for caching the sources/ebuild-tree and built packages, and actually installing the built package. OK, hopefully at least /some/ of that helps. The ionice suggestion wasn't wrong, but if you were facing some of these other issues, it's not entirely surprising that it didn't help, especially because by the posted suggestion, you were trying to ionice the userspace balance command, when the real trouble was the kernel threads doing the actual work. Unfortunately, those aren't as easy to ionice, tho in theory it could be done.[3] --- [1]Few complaints about IO on SSD: I'm on ssd too and no complaints about IO here, tho for my use-case I may not notice 5 second stalls. 30 second I'd notice, but I've not seen that since I switched off spinning rust, or actually before, since I tuned my IO, as above. Tho my btrfs use-case is rather simple, multiple smallish (mostly under 100 GiB per- device) independent btrfs pair-device raid1, on partitioned ssds, no subvolumes, snapshots or quotas. At that small size on SSDs, full balances/scrubs/etc normally take under a minute, so I use the no- backgrounding option where necessary and normally wait for it to complete, tho I sometimes switch to doing something else for a minute or so in the mean time. Tho of course if something goes really wrong, like an ssd failing, I'll have multiple btrfs to deal with, as I have it partitioned up, with multiple pair-device btrfs using a partition on it for one device of their pair. [2] Balance and check reflink costs: Some people just bite the bullet and don't worry about balance and check times because with their use- cases, falling back to backup and redoing the filesystem from scratch is simpler/faster and more reliable than trying to balance to a different btrfs layout or check their way out of trouble. [3] Ionicing btrfs balance kernel worker threads: Simplest would be to have balance take parameters for it to hand the kernel btrfs to use when it kicks off the threads, like scrub apparently does. Lacking that, I can envision some daemon watching for such threads and ionicing them as it finds them. But that's way more complicated than just feeding the options to a btrfs balance commandline as can be done with scrub, and with a bit of luck, especially because you /are/ after all already running ssd, /may/ be unnecessary once the above suggestions are taken into account. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: btrfs balance problems 2017-12-28 0:39 ` Duncan @ 2017-12-30 0:34 ` Kai Krakow 2018-01-06 18:09 ` James Courtier-Dutton 1 sibling, 0 replies; 9+ messages in thread From: Kai Krakow @ 2017-12-30 0:34 UTC (permalink / raw) To: linux-btrfs Am Thu, 28 Dec 2017 00:39:37 +0000 schrieb Duncan: >> I can I get btrfs balance to work in the background, without adversely >> affecting other applications? > > I'd actually suggest a different strategy. > > What I did here way back when I was still on reiserfs on spinning rust, > where it made more difference than on ssd, but I kept the settings when > I switched to ssd and btrfs, and at least some others have mentioned > that similar settings helped them on btrfs as well, is... > > Problem: The kernel virtual-memory subsystem's writeback cache was > originally configured for systems with well under a Gigabyte of RAM, and > the defaults no longer work so well on multi-GiB-RAM systems, > particularly above 8 GiB RAM, because they are based on a percentage of > available RAM, and will typically let several GiB of dirty writeback > cache accumulate before kicking off any attempt to actually write it to > storage. On spinning rust, when writeback /does/ finally kickoff, this > can result in hogging the IO for well over half a minute at a time, > where 30 seconds also happens to be the default "flush it anyway" time. This is somehow like the buffer bloat discussion for networking... Big buffers increase latency. There is more than one type of buffer. Additionally to what Duncan wrote (first type of buffer), the kernel lately got a new option to fight this "buffer bloat": writeback- throttling. It may help to enable that option. The second type of buffer is the io queue. So, you may also want to lower the io queue depth (nr_requests) of your devices. I think it defaults to 128 while most consumer drives only have a queue depth of 31 or 32 commands. Thus, reducing nr_requests for some of your devices may help you achieve better latency (but reduces throughput). Especially if working with io schedulers that do not implement io priorities, you could simply lower nr_requests to around or below the native command queue depth of your devices. The device itself can handle it better in that case, especially on spinning rust, as the firmware knows when to pull certain selected commands from the queue during a rotation of the media. The kernel knows nothing about rotary positions, it can only use the queue to prioritize and reorder requests but cannot take advantage of rotary positions of the heads. See $ grep ^ /sys/block/*/queue/nr_requests You may also get better results with increasing the nr_requests instead but at the cost of also adjusting the write buffer sizes, because with large nr_requests, you don't want blocking on writes so early, at least not when you need good latency. This probably works best for you with schedulers that care about latency, like deadline or kyber. For testing, keep in mind that everything works in dependence to each other setting. So change one at a time, take your tests, then change another and see how that relates to the first change, even when the first change made your experience worse. Another tip that's missing: Put different access classes onto different devices. That is, if you have a directory structure that's mostly written to, put it on it's own physical devices, with separate tuning and appropriate filesystem (log structured and cow filesystems are good at streaming writes). Put read mostly workloads also on their own device and filesystems. Put realtime workloads on their own device and filesystems. This gives you a much easier chance to succeed. -- Regards, Kai Replies to list-only preferred. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: btrfs balance problems 2017-12-28 0:39 ` Duncan 2017-12-30 0:34 ` Kai Krakow @ 2018-01-06 18:09 ` James Courtier-Dutton 1 sibling, 0 replies; 9+ messages in thread From: James Courtier-Dutton @ 2018-01-06 18:09 UTC (permalink / raw) To: Duncan; +Cc: linux-btrfs On 28 December 2017 at 00:39, Duncan <1i5t5.duncan@cox.net> wrote: > > AFAIK, ionice only works for some IO schedulers, not all. It does work > with the default CFQ scheduler, but I don't /believe/ it works with > deadline, certainly not with noop, and I'd /guess/ it doesn't work with > block-multiqueue (and thus not with bfq or kyber) at all, tho it's > possible it does in the latest kernels, since multi-queue is targeted to > eventually replace, at least as default, the older single-queue options. > > So which scheduler are you using and are you on multi-queue or not? > Thank you. The install had defaulted to deadline. I have now switched it to CFQ, and the system is much more responsive/interactive now during a btrfs balance. I will test it when I next get a chance, to see if that has helped me. After reading about it: deadline: more likely to complete long sequential reads/writes and not switch tasks.Thus reducing the amount of seeking but impacting concurrent tasks. cfq: more likely to break up long sequential reads/writes to permit other tasks to do some work. Thus increasing the amount of seeking but helping concurrent tasks. This would explain why "cfq" is best for me. I have not yet looked at "multi-queue". ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: btrfs balance problems 2017-12-23 11:19 btrfs balance problems James Courtier-Dutton 2017-12-23 11:56 ` Alberto Bursi @ 2017-12-28 11:15 ` Nikolay Borisov 2017-12-30 0:43 ` Hans van Kranenburg 1 sibling, 1 reply; 9+ messages in thread From: Nikolay Borisov @ 2017-12-28 11:15 UTC (permalink / raw) To: James Courtier-Dutton, linux-btrfs On 23.12.2017 13:19, James Courtier-Dutton wrote: > Hi, > > During a btrfs balance, the process hogs all CPU. > Or, to be exact, any other program that wishes to use the SSD during a > btrfs balance is blocked for long periods. Long periods being more > than 5 seconds. > Is there any way to multiplex SSD access while btrfs balance is > operating, so that other applications can still access the SSD with > relatively low latency? > > My guess is that btrfs is doing a transaction with a large number of > SSD blocks at a time, and thus blocking other applications. > > This makes for atrocious user interactivity as well as applications > failing because they cannot access the disk in a relatively low latent > manner. > For, example, this is causing a High Definition network CCTV > application to fail. > > What I would really like, is for some way to limit SSD bandwidths to > applications. > For example the CCTV app always gets the bandwidth it needs, and all > other applications can still access the SSD, but are rate limited. > This would fix my particular problem. > We have rate limiting for network applications, why not disk access also? So how are you running btrfs balance? Are you using any filters whatsoever? The documentation [https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-balance] has the following warning: Warning: running balance without filters will take a lot of time as it basically rewrites the entire filesystem and needs to update all block pointers. > > Kind Regards > > James > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: btrfs balance problems 2017-12-28 11:15 ` Nikolay Borisov @ 2017-12-30 0:43 ` Hans van Kranenburg 0 siblings, 0 replies; 9+ messages in thread From: Hans van Kranenburg @ 2017-12-30 0:43 UTC (permalink / raw) To: Nikolay Borisov, James Courtier-Dutton, linux-btrfs On 12/28/2017 12:15 PM, Nikolay Borisov wrote: > > On 23.12.2017 13:19, James Courtier-Dutton wrote: >> >> During a btrfs balance, the process hogs all CPU. >> Or, to be exact, any other program that wishes to use the SSD during a >> btrfs balance is blocked for long periods. Long periods being more >> than 5 seconds. >> Is there any way to multiplex SSD access while btrfs balance is >> operating, so that other applications can still access the SSD with >> relatively low latency? >> >> My guess is that btrfs is doing a transaction with a large number of >> SSD blocks at a time, and thus blocking other applications. >> >> This makes for atrocious user interactivity as well as applications >> failing because they cannot access the disk in a relatively low latent >> manner. >> For, example, this is causing a High Definition network CCTV >> application to fail. >> >> What I would really like, is for some way to limit SSD bandwidths to >> applications. >> For example the CCTV app always gets the bandwidth it needs, and all >> other applications can still access the SSD, but are rate limited. >> This would fix my particular problem. >> We have rate limiting for network applications, why not disk access also? > > So how are you running btrfs balance? Or, to again take one step further back... *Why* are you running btrfs balance at all? :) > Are you using any filters > whatsoever? The documentation > [https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-balance] has the > following warning: > > Warning: running balance without filters will take a lot of time as it > basically rewrites the entire filesystem and needs to update all block > pointers. -- Hans van Kranenburg ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2018-01-06 18:10 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-23 11:19 btrfs balance problems James Courtier-Dutton
2017-12-23 11:56 ` Alberto Bursi
[not found] ` <CAAMvbhHV=BvRLv14U0JRrYmhiXeREOTNiVLPkuq=MO6dH4jDiQ@mail.gmail.com>
2017-12-27 21:39 ` James Courtier-Dutton
2017-12-27 21:54 ` waxhead
2017-12-28 0:39 ` Duncan
2017-12-30 0:34 ` Kai Krakow
2018-01-06 18:09 ` James Courtier-Dutton
2017-12-28 11:15 ` Nikolay Borisov
2017-12-30 0:43 ` Hans van Kranenburg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).