* Disabling in-memory write cache for x86-64 in Linux II @ 2013-10-25 7:25 Artem S. Tashkinov 2013-10-25 8:18 ` Linus Torvalds 2013-10-25 10:49 ` NeilBrown 0 siblings, 2 replies; 21+ messages in thread From: Artem S. Tashkinov @ 2013-10-25 7:25 UTC (permalink / raw) To: linux-kernel; +Cc: torvalds, linux-fsdevel, axboe, linux-mm Hello! On my x86-64 PC (Intel Core i5 2500, 16GB RAM), I have the same 3.11 kernel built for the i686 (with PAE) and x86-64 architectures. What's really troubling me is that the x86-64 kernel has the following problem: When I copy large files to any storage device, be it my HDD with ext4 partitions or flash drive with FAT32 partitions, the kernel first caches them in memory entirely then flushes them some time later (quite unpredictably though) or immediately upon invoking "sync". How can I disable this memory cache altogether (or at least minimize caching)? When running the i686 kernel with the same configuration I don't observe this effect - files get written out almost immediately (for instance "sync" takes less than a second, whereas on x86-64 it can take a dozen of _minutes_ depending on a file size and storage performance). I'm _not_ talking about disabling write cache on my storage itself (hdparm -W 0 /dev/XXX) - firstly this command is detrimental to the performance of my PC, secondly, it won't help in this instance. Swap is totally disabled, usually my memory is entirely free. My kernel configuration can be fetched here: https://bugzilla.kernel.org/show_bug.cgi?id=63531 Please, advise. Best regards, Artem -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Disabling in-memory write cache for x86-64 in Linux II 2013-10-25 7:25 Disabling in-memory write cache for x86-64 in Linux II Artem S. Tashkinov @ 2013-10-25 8:18 ` Linus Torvalds 2013-11-05 0:50 ` Andreas Dilger 2013-11-05 6:32 ` Figo.zhang 2013-10-25 10:49 ` NeilBrown 1 sibling, 2 replies; 21+ messages in thread From: Linus Torvalds @ 2013-10-25 8:18 UTC (permalink / raw) To: Artem S. Tashkinov, Wu Fengguang, Andrew Morton Cc: Linux Kernel Mailing List, linux-fsdevel, Jens Axboe, linux-mm On Fri, Oct 25, 2013 at 8:25 AM, Artem S. Tashkinov <t.artem@lycos.com> wrote: > > On my x86-64 PC (Intel Core i5 2500, 16GB RAM), I have the same 3.11 kernel > built for the i686 (with PAE) and x86-64 architectures. What's really troubling me > is that the x86-64 kernel has the following problem: > > When I copy large files to any storage device, be it my HDD with ext4 partitions > or flash drive with FAT32 partitions, the kernel first caches them in memory entirely > then flushes them some time later (quite unpredictably though) or immediately upon > invoking "sync". Yeah, I think we default to a 10% "dirty background memory" (and allows up to 20% dirty), so on your 16GB machine, we allow up to 1.6GB of dirty memory for writeout before we even start writing, and twice that before we start *waiting* for it. On 32-bit x86, we only count the memory in the low 1GB (really actually up to about 890MB), so "10% dirty" really means just about 90MB of buffering (and a "hard limit" of ~180MB of dirty). And that "up to 3.2GB of dirty memory" is just crazy. Our defaults come from the old days of less memory (and perhaps servers that don't much care), and the fact that x86-32 ends up having much lower limits even if you end up having more memory. You can easily tune it: echo $((16*1024*1024)) > /proc/sys/vm/dirty_background_bytes echo $((48*1024*1024)) > /proc/sys/vm/dirty_bytes or similar. But you're right, we need to make the defaults much saner. Wu? Andrew? Comments? Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Disabling in-memory write cache for x86-64 in Linux II 2013-10-25 8:18 ` Linus Torvalds @ 2013-11-05 0:50 ` Andreas Dilger 2013-11-05 4:12 ` Dave Chinner 2013-11-05 6:32 ` Figo.zhang 1 sibling, 1 reply; 21+ messages in thread From: Andreas Dilger @ 2013-11-05 0:50 UTC (permalink / raw) To: Artem S. Tashkinov Cc: Wu Fengguang, Linus Torvalds, Andrew Morton, Linux Kernel Mailing List, linux-fsdevel, Jens Axboe, linux-mm On Oct 25, 2013, at 2:18 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Fri, Oct 25, 2013 at 8:25 AM, Artem S. Tashkinov <t.artem@lycos.com> wrote: >> >> On my x86-64 PC (Intel Core i5 2500, 16GB RAM), I have the same 3.11 >> kernel built for the i686 (with PAE) and x86-64 architectures. What’s >> really troubling me is that the x86-64 kernel has the following problem: >> >> When I copy large files to any storage device, be it my HDD with ext4 >> partitions or flash drive with FAT32 partitions, the kernel first >> caches them in memory entirely then flushes them some time later >> (quite unpredictably though) or immediately upon invoking "sync". > > Yeah, I think we default to a 10% "dirty background memory" (and > allows up to 20% dirty), so on your 16GB machine, we allow up to 1.6GB > of dirty memory for writeout before we even start writing, and twice > that before we start *waiting* for it. > > On 32-bit x86, we only count the memory in the low 1GB (really > actually up to about 890MB), so "10% dirty" really means just about > 90MB of buffering (and a "hard limit" of ~180MB of dirty). > > And that "up to 3.2GB of dirty memory" is just crazy. Our defaults > come from the old days of less memory (and perhaps servers that don't > much care), and the fact that x86-32 ends up having much lower limits > even if you end up having more memory. I think the “delay writes for a long time” is a holdover from the days when e.g. /tmp was on a disk and compilers had lousy IO patterns, then they deleted the file. Today, /tmp is always in RAM, and IMHO the “write and delete” workload tested by dbench is not worthwhile optimizing for. With Lustre, we’ve long taken the approach that if there is enough dirty data on a file to make a decent write (which is around 8MB today even for very fast storage) then there isn’t much point to hold back for more data before starting the IO. Any decent allocator will be able to grow allocated extents to handle following data, or allocate a new extent. At 4-8MB extents, even very seek-impaired media could do 400-800MB/s (likely much faster than the underlying storage anyway). This also avoids wasting (tens of?) seconds of idle disk bandwidth. If the disk is already busy, then the IO will be delayed anyway. If it is not busy, then why aggregate GB of dirty data in memory before flushing it? Something simple like “start writing at 16MB dirty on a single file” would probably avoid a lot of complexity at little real-world cost. That shouldn’t throttle dirtying memory above 16MB, but just start writeout much earlier than it does today. Cheers, Andreas -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Disabling in-memory write cache for x86-64 in Linux II 2013-11-05 0:50 ` Andreas Dilger @ 2013-11-05 4:12 ` Dave Chinner 2013-11-07 13:48 ` Jan Kara 0 siblings, 1 reply; 21+ messages in thread From: Dave Chinner @ 2013-11-05 4:12 UTC (permalink / raw) To: Andreas Dilger Cc: Artem S. Tashkinov, Wu Fengguang, Linus Torvalds, Andrew Morton, Linux Kernel Mailing List, linux-fsdevel, Jens Axboe, linux-mm On Mon, Nov 04, 2013 at 05:50:13PM -0700, Andreas Dilger wrote: > > On Oct 25, 2013, at 2:18 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Fri, Oct 25, 2013 at 8:25 AM, Artem S. Tashkinov <t.artem@lycos.com> wrote: > >> > >> On my x86-64 PC (Intel Core i5 2500, 16GB RAM), I have the same 3.11 > >> kernel built for the i686 (with PAE) and x86-64 architectures. What’s > >> really troubling me is that the x86-64 kernel has the following problem: > >> > >> When I copy large files to any storage device, be it my HDD with ext4 > >> partitions or flash drive with FAT32 partitions, the kernel first > >> caches them in memory entirely then flushes them some time later > >> (quite unpredictably though) or immediately upon invoking "sync". > > > > Yeah, I think we default to a 10% "dirty background memory" (and > > allows up to 20% dirty), so on your 16GB machine, we allow up to 1.6GB > > of dirty memory for writeout before we even start writing, and twice > > that before we start *waiting* for it. > > > > On 32-bit x86, we only count the memory in the low 1GB (really > > actually up to about 890MB), so "10% dirty" really means just about > > 90MB of buffering (and a "hard limit" of ~180MB of dirty). > > > > And that "up to 3.2GB of dirty memory" is just crazy. Our defaults > > come from the old days of less memory (and perhaps servers that don't > > much care), and the fact that x86-32 ends up having much lower limits > > even if you end up having more memory. > > I think the “delay writes for a long time” is a holdover from the > days when e.g. /tmp was on a disk and compilers had lousy IO > patterns, then they deleted the file. Today, /tmp is always in > RAM, and IMHO the “write and delete” workload tested by dbench > is not worthwhile optimizing for. > > With Lustre, we’ve long taken the approach that if there is enough > dirty data on a file to make a decent write (which is around 8MB > today even for very fast storage) then there isn’t much point to > hold back for more data before starting the IO. Agreed - write-through caching is much better for high throughput streaming data environments than write back caching that can leave the devices unnecessarily idle. However, most systems are not running in high-throughput streaming data environments... :/ > Any decent allocator will be able to grow allocated extents to > handle following data, or allocate a new extent. At 4-8MB extents, > even very seek-impaired media could do 400-800MB/s (likely much > faster than the underlying storage anyway). True, but this makes the assumption that the filesystem you are using is optimising purely for write throughput and your storage is not seek limited on reads. That's simply not an assumption we can allow the generic writeback code to make. In more detail, if we simply implement "we have 8 MB of dirty pages on a single file, write it" we can maximise write throughput by allocating sequentially on disk for each subsquent write. The problem with this comes when you are writing multiple files at a time, and that leads to this pattern on disk: ABC...ABC....ABC....ABC.... And the result is a) fragmented files b) a large number of seeks during sequential read operations and c) filesystems that age and degrade rapidly under workloads that concurrently write files with different life times (i.e. due to free space fragmention). In some situations this is acceptable, but the performance degradation as the filesystem ages that this sort of allocation causes in most environments is not. I'd say that >90% of filesystems out there would suffer accelerated aging as a result of doing writeback in this manner by default. > This also avoids wasting (tens of?) seconds of idle disk bandwidth. > If the disk is already busy, then the IO will be delayed anyway. > If it is not busy, then why aggregate GB of dirty data in memory > before flushing it? There are plenty of workloads out there where delaying IO for a few seconds can result in writeback that is an order of magnitude faster. Similarly, I've seen other workloads where the writeback delay results in files that can be *read* orders of magnitude faster.... > Something simple like “start writing at 16MB dirty on a single file” > would probably avoid a lot of complexity at little real-world cost. > That shouldn’t throttle dirtying memory above 16MB, but just start > writeout much earlier than it does today. That doesn't solve the "slow device, large file" problem. We can write data into the page cache at rates of over a GB/s, so it's irrelevant to a device that can write at 5MB/s whether we start writeback immediately or a second later when there is 500MB of dirty pages in memory. AFAIK, the only way to avoid that problem is to use write-through caching for such devices - where they throttle to the IO rate at very low levels of cached data. Realistically, there is no "one right answer" for all combinations of applications, filesystems and hardware, but writeback caching is the best *general solution* we've got right now. However, IMO users should not need to care about tuning BDI dirty ratios or even have to understand what a BDI dirty ratio is to select the rigth caching method for their devices and/or workload. The difference between writeback and write through caching is easy to explain and AFAICT those two modes suffice to solve the problems being discussed here. Further, if two modes suffice to solve the problems, then we should be able to easily define a trigger to automatically switch modes. /me notes that if we look at random vs sequential IO and the impact that has on writeback duration, then it's very similar to suddenly having a very slow device. IOWs, fadvise(RANDOM) could be used to switch an *inode* to write through mode rather than writeback mode to solve the problem aggregating massive amounts of random write IO in the page cache... So rather than treating this as a "one size fits all" type of problem, let's step back and: a) define 2-3 different caching behaviours we consider optimal for the majority of workloads/hardware we care about. b) determine optimal workloads for each caching behaviour. c) develop reliable triggers to detect when we should switch between caching behaviours. e.g: a) write back caching - what we have now write through caching - extremely low dirty threshold before writeback starts, enough to optimise for, say, stripe width of the underlying storage. b) write back caching: - general purpose workload write through caching: - slow device, write large file, sync - extremely high bandwidth devices, multi-stream sequential IO - random IO. c) write back caching: - default - fadvise(NORMAL, SEQUENTIAL, WILLNEED) write through caching: - fadvise(NOREUSE, DONTNEED, RANDOM) - random IO - sequential IO, BDI write bandwidth <<< dirty threshold - sequential IO, BDI write bandwidth >>> dirty threshold I think that covers most of the issues and use cases that have been discussed in this thread. IMO, this is the level at which we need to solve the problem (i.e. architectural), not at the level of "let's add sysfs variables so we can tweak bdi ratios". Indeed, the above implies that we need the caching behaviour to be a property of the address space, not just a property of the backing device. IOWs, the implementation needs to trickle down from a coherent high level design - that will define the knobs that we need to expose to userspace. We should not be adding new writeback behaviours by adding knobs to sysfs without first having some clue about whether we are solving the right problem and solving it in a sane manner... Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Disabling in-memory write cache for x86-64 in Linux II 2013-11-05 4:12 ` Dave Chinner @ 2013-11-07 13:48 ` Jan Kara 2013-11-11 3:22 ` Dave Chinner 0 siblings, 1 reply; 21+ messages in thread From: Jan Kara @ 2013-11-07 13:48 UTC (permalink / raw) To: Dave Chinner Cc: Andreas Dilger, Artem S. Tashkinov, Wu Fengguang, Linus Torvalds, Andrew Morton, Linux Kernel Mailing List, linux-fsdevel, Jens Axboe, linux-mm On Tue 05-11-13 15:12:45, Dave Chinner wrote: > On Mon, Nov 04, 2013 at 05:50:13PM -0700, Andreas Dilger wrote: > > Something simple like “start writing at 16MB dirty on a single file” > > would probably avoid a lot of complexity at little real-world cost. > > That shouldn’t throttle dirtying memory above 16MB, but just start > > writeout much earlier than it does today. > > That doesn't solve the "slow device, large file" problem. We can > write data into the page cache at rates of over a GB/s, so it's > irrelevant to a device that can write at 5MB/s whether we start > writeback immediately or a second later when there is 500MB of dirty > pages in memory. AFAIK, the only way to avoid that problem is to > use write-through caching for such devices - where they throttle to > the IO rate at very low levels of cached data. Agreed. > Realistically, there is no "one right answer" for all combinations > of applications, filesystems and hardware, but writeback caching is > the best *general solution* we've got right now. > > However, IMO users should not need to care about tuning BDI dirty > ratios or even have to understand what a BDI dirty ratio is to > select the rigth caching method for their devices and/or workload. > The difference between writeback and write through caching is easy > to explain and AFAICT those two modes suffice to solve the problems > being discussed here. Further, if two modes suffice to solve the > problems, then we should be able to easily define a trigger to > automatically switch modes. > > /me notes that if we look at random vs sequential IO and the impact > that has on writeback duration, then it's very similar to suddenly > having a very slow device. IOWs, fadvise(RANDOM) could be used to > switch an *inode* to write through mode rather than writeback mode > to solve the problem aggregating massive amounts of random write IO > in the page cache... I disagree here. Writeback cache is also useful for aggregating random writes and making semi-sequential writes out of them. There are quite some applications which rely on the fact that they can write a file in a rather random manner (Berkeley DB, linker, ...) but the files are written out in one large linear sweep. That is actually the reason why SLES (and I believe RHEL as well) tune dirty_limit even higher than what's the default value. So I think it's rather the other way around: If you can detect the file is being written in a streaming manner, there's not much point in caching too much data for it. And I agree with you that we also have to be careful not to cache too few because otherwise two streaming writes would be interleaved too much. Currently, we have writeback_chunk_size() which determines how much we ask to write from a single inode. So streaming writers are going to be interleaved at this chunk size anyway (currently that number is "measured bandwidth / 2"). So it would make sense to also limit amount of dirty cache for each file with streaming pattern at this number. > So rather than treating this as a "one size fits all" type of > problem, let's step back and: > > a) define 2-3 different caching behaviours we consider > optimal for the majority of workloads/hardware we care > about. > b) determine optimal workloads for each caching > behaviour. > c) develop reliable triggers to detect when we > should switch between caching behaviours. > > e.g: > > a) write back caching > - what we have now > write through caching > - extremely low dirty threshold before writeback > starts, enough to optimise for, say, stripe width > of the underlying storage. > > b) write back caching: > - general purpose workload > write through caching: > - slow device, write large file, sync > - extremely high bandwidth devices, multi-stream > sequential IO > - random IO. > > c) write back caching: > - default > - fadvise(NORMAL, SEQUENTIAL, WILLNEED) > write through caching: > - fadvise(NOREUSE, DONTNEED, RANDOM) > - random IO > - sequential IO, BDI write bandwidth <<< dirty threshold > - sequential IO, BDI write bandwidth >>> dirty threshold > > I think that covers most of the issues and use cases that have been > discussed in this thread. IMO, this is the level at which we need to > solve the problem (i.e. architectural), not at the level of "let's > add sysfs variables so we can tweak bdi ratios". > > Indeed, the above implies that we need the caching behaviour to be a > property of the address space, not just a property of the backing > device. Yes, and that would be interesting to implement and not make a mess out of the whole writeback logic because the way we currently do writeback is inherently BDI based. When we introduce some special per-inode limits, flusher threads would have to pick more carefully what to write and what not. We might be forced to go that way eventually anyway because of memcg aware writeback but it's not a simple step. > IOWs, the implementation needs to trickle down from a coherent high > level design - that will define the knobs that we need to expose to > userspace. We should not be adding new writeback behaviours by > adding knobs to sysfs without first having some clue about whether > we are solving the right problem and solving it in a sane manner... Agreed. But the ability to limit amount of dirty pages outstanding against a particular BDI seems as a sane one to me. It's not as flexible and automatic as the approach you suggested but it's much simpler and solves most of problems we currently have. The biggest objection against the sysfs-tunable approach is that most people won't have a clue meaning that the tunable is useless for them. But I wonder if something like: 1) turn on strictlimit by default 2) don't allow dirty cache of BDI to grow over 5s of measured writeback speed won't go a long way into solving our current problems without too much complication... Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Disabling in-memory write cache for x86-64 in Linux II 2013-11-07 13:48 ` Jan Kara @ 2013-11-11 3:22 ` Dave Chinner 2013-11-11 19:31 ` Jan Kara 0 siblings, 1 reply; 21+ messages in thread From: Dave Chinner @ 2013-11-11 3:22 UTC (permalink / raw) To: Jan Kara Cc: Andreas Dilger, Artem S. Tashkinov, Wu Fengguang, Linus Torvalds, Andrew Morton, Linux Kernel Mailing List, linux-fsdevel, Jens Axboe, linux-mm On Thu, Nov 07, 2013 at 02:48:06PM +0100, Jan Kara wrote: > On Tue 05-11-13 15:12:45, Dave Chinner wrote: > > On Mon, Nov 04, 2013 at 05:50:13PM -0700, Andreas Dilger wrote: > > > Something simple like “start writing at 16MB dirty on a single file” > > > would probably avoid a lot of complexity at little real-world cost. > > > That shouldn’t throttle dirtying memory above 16MB, but just start > > > writeout much earlier than it does today. > > > > That doesn't solve the "slow device, large file" problem. We can > > write data into the page cache at rates of over a GB/s, so it's > > irrelevant to a device that can write at 5MB/s whether we start > > writeback immediately or a second later when there is 500MB of dirty > > pages in memory. AFAIK, the only way to avoid that problem is to > > use write-through caching for such devices - where they throttle to > > the IO rate at very low levels of cached data. > Agreed. > > > Realistically, there is no "one right answer" for all combinations > > of applications, filesystems and hardware, but writeback caching is > > the best *general solution* we've got right now. > > > > However, IMO users should not need to care about tuning BDI dirty > > ratios or even have to understand what a BDI dirty ratio is to > > select the rigth caching method for their devices and/or workload. > > The difference between writeback and write through caching is easy > > to explain and AFAICT those two modes suffice to solve the problems > > being discussed here. Further, if two modes suffice to solve the > > problems, then we should be able to easily define a trigger to > > automatically switch modes. > > > > /me notes that if we look at random vs sequential IO and the impact > > that has on writeback duration, then it's very similar to suddenly > > having a very slow device. IOWs, fadvise(RANDOM) could be used to > > switch an *inode* to write through mode rather than writeback mode > > to solve the problem aggregating massive amounts of random write IO > > in the page cache... > I disagree here. Writeback cache is also useful for aggregating random > writes and making semi-sequential writes out of them. There are quite some > applications which rely on the fact that they can write a file in a rather > random manner (Berkeley DB, linker, ...) but the files are written out in > one large linear sweep. That is actually the reason why SLES (and I believe > RHEL as well) tune dirty_limit even higher than what's the default value. Right - but the correct behaviour really depends on the pattern of randomness. The common case we get into trouble with is when no clustering occurs and we end up with small, random IO for gigabytes of cached data. That's the case where write-through caching for random data is better. It's also questionable whether writeback caching for aggregation is faster for random IO on high-IOPS devices or not. Again, I think it woul depend very much on how random the patterns are... > So I think it's rather the other way around: If you can detect the file is > being written in a streaming manner, there's not much point in caching too > much data for it. But we're not talking about how much data we cache here - we are considering how much data we allow to get dirty before writing it back. It doesn't matter if we use writeback or write through caching, the page cache footprint for a given workload is likely to be similar, but without any data we can't draw any conclusions here. > And I agree with you that we also have to be careful not > to cache too few because otherwise two streaming writes would be > interleaved too much. Currently, we have writeback_chunk_size() which > determines how much we ask to write from a single inode. So streaming > writers are going to be interleaved at this chunk size anyway (currently > that number is "measured bandwidth / 2"). So it would make sense to also > limit amount of dirty cache for each file with streaming pattern at this > number. My experience says that for streaming IO we typically need at least 5s of cached *dirty* data to even out delays and latencies in the writeback IO pipeline. Hence limiting a file to what we can write in a second given we might only write a file once a second is likely going to result in pipeline stalls... Remember, writeback caching is about maximising throughput, not minimising latency. The "sync latency" problem with caching too much dirty data on slow block devices is really a corner case behaviour and should not compromise the common case for bulk writeback throughput. > > Indeed, the above implies that we need the caching behaviour to be a > > property of the address space, not just a property of the backing > > device. > Yes, and that would be interesting to implement and not make a mess out > of the whole writeback logic because the way we currently do writeback is > inherently BDI based. When we introduce some special per-inode limits, > flusher threads would have to pick more carefully what to write and what > not. We might be forced to go that way eventually anyway because of memcg > aware writeback but it's not a simple step. Agreed, it's not simple, and that's why we need to start working from the architectural level.... > > IOWs, the implementation needs to trickle down from a coherent high > > level design - that will define the knobs that we need to expose to > > userspace. We should not be adding new writeback behaviours by > > adding knobs to sysfs without first having some clue about whether > > we are solving the right problem and solving it in a sane manner... > Agreed. But the ability to limit amount of dirty pages outstanding > against a particular BDI seems as a sane one to me. It's not as flexible > and automatic as the approach you suggested but it's much simpler and > solves most of problems we currently have. That's true, but.... > The biggest objection against the sysfs-tunable approach is that most > people won't have a clue meaning that the tunable is useless for them. .... that's the big problem I see - nobody is going to know how to use it, when to use it, or be able to tell if it's the root cause of some weird performance problem they are seeing. > But I > wonder if something like: > 1) turn on strictlimit by default > 2) don't allow dirty cache of BDI to grow over 5s of measured writeback > speed > > won't go a long way into solving our current problems without too much > complication... Turning on strict limit by default is going to change behaviour quite markedly. Again, it's not something I'd want to see done without a bunch of data showing that it doesn't cause regressions for common workloads... Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Disabling in-memory write cache for x86-64 in Linux II 2013-11-11 3:22 ` Dave Chinner @ 2013-11-11 19:31 ` Jan Kara 0 siblings, 0 replies; 21+ messages in thread From: Jan Kara @ 2013-11-11 19:31 UTC (permalink / raw) To: Dave Chinner Cc: Jan Kara, Andreas Dilger, Artem S. Tashkinov, Wu Fengguang, Linus Torvalds, Andrew Morton, Linux Kernel Mailing List, linux-fsdevel, Jens Axboe, linux-mm On Mon 11-11-13 14:22:11, Dave Chinner wrote: > On Thu, Nov 07, 2013 at 02:48:06PM +0100, Jan Kara wrote: > > On Tue 05-11-13 15:12:45, Dave Chinner wrote: > > > On Mon, Nov 04, 2013 at 05:50:13PM -0700, Andreas Dilger wrote: > > > Realistically, there is no "one right answer" for all combinations > > > of applications, filesystems and hardware, but writeback caching is > > > the best *general solution* we've got right now. > > > > > > However, IMO users should not need to care about tuning BDI dirty > > > ratios or even have to understand what a BDI dirty ratio is to > > > select the rigth caching method for their devices and/or workload. > > > The difference between writeback and write through caching is easy > > > to explain and AFAICT those two modes suffice to solve the problems > > > being discussed here. Further, if two modes suffice to solve the > > > problems, then we should be able to easily define a trigger to > > > automatically switch modes. > > > > > > /me notes that if we look at random vs sequential IO and the impact > > > that has on writeback duration, then it's very similar to suddenly > > > having a very slow device. IOWs, fadvise(RANDOM) could be used to > > > switch an *inode* to write through mode rather than writeback mode > > > to solve the problem aggregating massive amounts of random write IO > > > in the page cache... > > I disagree here. Writeback cache is also useful for aggregating random > > writes and making semi-sequential writes out of them. There are quite some > > applications which rely on the fact that they can write a file in a rather > > random manner (Berkeley DB, linker, ...) but the files are written out in > > one large linear sweep. That is actually the reason why SLES (and I believe > > RHEL as well) tune dirty_limit even higher than what's the default value. > > Right - but the correct behaviour really depends on the pattern of > randomness. The common case we get into trouble with is when no > clustering occurs and we end up with small, random IO for gigabytes > of cached data. That's the case where write-through caching for > random data is better. > > It's also questionable whether writeback caching for aggregation is > faster for random IO on high-IOPS devices or not. Again, I think it > woul depend very much on how random the patterns are... I agree usefulness of writeback caching for random IO very much depends on the working set size vs cache size, how random the accesses really are, and HW characteristics. I just wanted to point out there are fairly common workloads & setups where writeback caching for semi-random IO really helps (because you seemed to suggest that random IO implies we should disable writeback cache). > > So I think it's rather the other way around: If you can detect the file is > > being written in a streaming manner, there's not much point in caching too > > much data for it. > > But we're not talking about how much data we cache here - we are > considering how much data we allow to get dirty before writing it > back. Sorry, I was imprecise here. I really meant that IMO it doesn't make sense to allow too much dirty data for sequentially written files. > It doesn't matter if we use writeback or write through > caching, the page cache footprint for a given workload is likely to > be similar, but without any data we can't draw any conclusions here. > > > And I agree with you that we also have to be careful not > > to cache too few because otherwise two streaming writes would be > > interleaved too much. Currently, we have writeback_chunk_size() which > > determines how much we ask to write from a single inode. So streaming > > writers are going to be interleaved at this chunk size anyway (currently > > that number is "measured bandwidth / 2"). So it would make sense to also > > limit amount of dirty cache for each file with streaming pattern at this > > number. > > My experience says that for streaming IO we typically need at least > 5s of cached *dirty* data to even out delays and latencies in the > writeback IO pipeline. Hence limiting a file to what we can write in > a second given we might only write a file once a second is likely > going to result in pipeline stalls... I guess this begs for real data. We agree in principle but differ in constants :). > Remember, writeback caching is about maximising throughput, not > minimising latency. The "sync latency" problem with caching too much > dirty data on slow block devices is really a corner case behaviour > and should not compromise the common case for bulk writeback > throughput. Agreed. As a primary goal we want to maximise throughput. But we want to maintain sane latency as well (e.g. because we have a "promise" of "dirty_writeback_centisecs" we have to cycle through dirty inodes reasonably frequently). > > Agreed. But the ability to limit amount of dirty pages outstanding > > against a particular BDI seems as a sane one to me. It's not as flexible > > and automatic as the approach you suggested but it's much simpler and > > solves most of problems we currently have. > > That's true, but.... > > > The biggest objection against the sysfs-tunable approach is that most > > people won't have a clue meaning that the tunable is useless for them. > > .... that's the big problem I see - nobody is going to know how to > use it, when to use it, or be able to tell if it's the root cause of > some weird performance problem they are seeing. > > > But I > > wonder if something like: > > 1) turn on strictlimit by default > > 2) don't allow dirty cache of BDI to grow over 5s of measured writeback > > speed > > > > won't go a long way into solving our current problems without too much > > complication... > > Turning on strict limit by default is going to change behaviour > quite markedly. Again, it's not something I'd want to see done > without a bunch of data showing that it doesn't cause regressions > for common workloads... Agreed. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Disabling in-memory write cache for x86-64 in Linux II 2013-10-25 8:18 ` Linus Torvalds 2013-11-05 0:50 ` Andreas Dilger @ 2013-11-05 6:32 ` Figo.zhang 1 sibling, 0 replies; 21+ messages in thread From: Figo.zhang @ 2013-11-05 6:32 UTC (permalink / raw) To: Linus Torvalds Cc: Artem S. Tashkinov, Wu Fengguang, Andrew Morton, Linux Kernel Mailing List, linux-fsdevel, Jens Axboe, linux-mm [-- Attachment #1: Type: text/plain, Size: 596 bytes --] > Yeah, I think we default to a 10% "dirty background memory" (and > allows up to 20% dirty), so on your 16GB machine, we allow up to 1.6GB > of dirty memory for writeout before we even start writing, and twice > that before we start *waiting* for it. > > On 32-bit x86, we only count the memory in the low 1GB (really > actually up to about 890MB), so "10% dirty" really means just about > 90MB of buffering (and a "hard limit" of ~180MB of dirty). > => On 32-bit system, the page cache also can use the high memory, so the size of 10% "dirty background memory" maybe 1.6GB for this case. > > [-- Attachment #2: Type: text/html, Size: 1026 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Disabling in-memory write cache for x86-64 in Linux II 2013-10-25 7:25 Disabling in-memory write cache for x86-64 in Linux II Artem S. Tashkinov 2013-10-25 8:18 ` Linus Torvalds @ 2013-10-25 10:49 ` NeilBrown 2013-10-25 11:26 ` David Lang 1 sibling, 1 reply; 21+ messages in thread From: NeilBrown @ 2013-10-25 10:49 UTC (permalink / raw) To: Artem S. Tashkinov; +Cc: linux-kernel, torvalds, linux-fsdevel, axboe, linux-mm [-- Attachment #1: Type: text/plain, Size: 2094 bytes --] On Fri, 25 Oct 2013 07:25:13 +0000 (UTC) "Artem S. Tashkinov" <t.artem@lycos.com> wrote: > Hello! > > On my x86-64 PC (Intel Core i5 2500, 16GB RAM), I have the same 3.11 kernel > built for the i686 (with PAE) and x86-64 architectures. What's really troubling me > is that the x86-64 kernel has the following problem: > > When I copy large files to any storage device, be it my HDD with ext4 partitions > or flash drive with FAT32 partitions, the kernel first caches them in memory entirely > then flushes them some time later (quite unpredictably though) or immediately upon > invoking "sync". > > How can I disable this memory cache altogether (or at least minimize caching)? When > running the i686 kernel with the same configuration I don't observe this effect - files get > written out almost immediately (for instance "sync" takes less than a second, whereas > on x86-64 it can take a dozen of _minutes_ depending on a file size and storage > performance). What exactly is bothering you about this? The amount of memory used or the time until data is flushed? If the later, then /proc/sys/vm/dirty_expire_centisecs is where you want to look. This defaults to 30 seconds (3000 centisecs). You could make it smaller (providing you also shrink dirty_writeback_centisecs in a similar ratio) and the VM will flush out data more quickly. NeilBrown > > I'm _not_ talking about disabling write cache on my storage itself (hdparm -W 0 /dev/XXX) > - firstly this command is detrimental to the performance of my PC, secondly, it won't help > in this instance. > > Swap is totally disabled, usually my memory is entirely free. > > My kernel configuration can be fetched here: https://bugzilla.kernel.org/show_bug.cgi?id=63531 > > Please, advise. > > Best regards, > > Artem > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Disabling in-memory write cache for x86-64 in Linux II 2013-10-25 10:49 ` NeilBrown @ 2013-10-25 11:26 ` David Lang 2013-10-25 18:26 ` Artem S. Tashkinov 0 siblings, 1 reply; 21+ messages in thread From: David Lang @ 2013-10-25 11:26 UTC (permalink / raw) To: NeilBrown Cc: Artem S. Tashkinov, linux-kernel, torvalds, linux-fsdevel, axboe, linux-mm On Fri, 25 Oct 2013, NeilBrown wrote: > On Fri, 25 Oct 2013 07:25:13 +0000 (UTC) "Artem S. Tashkinov" > <t.artem@lycos.com> wrote: > >> Hello! >> >> On my x86-64 PC (Intel Core i5 2500, 16GB RAM), I have the same 3.11 kernel >> built for the i686 (with PAE) and x86-64 architectures. What's really troubling me >> is that the x86-64 kernel has the following problem: >> >> When I copy large files to any storage device, be it my HDD with ext4 partitions >> or flash drive with FAT32 partitions, the kernel first caches them in memory entirely >> then flushes them some time later (quite unpredictably though) or immediately upon >> invoking "sync". >> >> How can I disable this memory cache altogether (or at least minimize caching)? When >> running the i686 kernel with the same configuration I don't observe this effect - files get >> written out almost immediately (for instance "sync" takes less than a second, whereas >> on x86-64 it can take a dozen of _minutes_ depending on a file size and storage >> performance). > > What exactly is bothering you about this? The amount of memory used or the > time until data is flushed? actually, I think the problem is more the impact of the huge write later on. David Lang > If the later, then /proc/sys/vm/dirty_expire_centisecs is where you want to > look. > This defaults to 30 seconds (3000 centisecs). > You could make it smaller (providing you also shrink > dirty_writeback_centisecs in a similar ratio) and the VM will flush out data > more quickly. > > NeilBrown > > >> >> I'm _not_ talking about disabling write cache on my storage itself (hdparm -W 0 /dev/XXX) >> - firstly this command is detrimental to the performance of my PC, secondly, it won't help >> in this instance. >> >> Swap is totally disabled, usually my memory is entirely free. >> >> My kernel configuration can be fetched here: https://bugzilla.kernel.org/show_bug.cgi?id=63531 >> >> Please, advise. >> >> Best regards, >> >> Artem >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Disabling in-memory write cache for x86-64 in Linux II 2013-10-25 11:26 ` David Lang @ 2013-10-25 18:26 ` Artem S. Tashkinov 2013-10-25 19:40 ` Diego Calleja ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Artem S. Tashkinov @ 2013-10-25 18:26 UTC (permalink / raw) To: david; +Cc: neilb, linux-kernel, torvalds, linux-fsdevel, axboe, linux-mm Oct 25, 2013 05:26:45 PM, david wrote: On Fri, 25 Oct 2013, NeilBrown wrote: > >> >> What exactly is bothering you about this? The amount of memory used or the >> time until data is flushed? > >actually, I think the problem is more the impact of the huge write later on. Exactly. And not being able to use applications which show you IO performance like Midnight Commander. You might prefer to use "cp -a" but I cannot imagine my life without being able to see the progress of a copying operation. With the current dirty cache there's no way to understand how you storage media actually behaves. Hopefully this issue won't dissolve into obscurity and someone will actually make up a plan (and a patch) how to make dirty write cache behave in a sane manner considering the fact that there are devices with very different write speeds and requirements. It'd be ever better, if I could specify dirty cache as a mount option (though sane defaults or semi-automatic values based on runtime estimates won't hurt). Per device dirty cache seems like a nice idea, I, for one, would like to disable it altogether or make it an absolute minimum for things like USB flash drives - because I don't care about multithreaded performance or delayed allocation on such devices - I'm interested in my data reaching my USB stick ASAP - because it's how most people use them. Regards, Artem -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Disabling in-memory write cache for x86-64 in Linux II 2013-10-25 18:26 ` Artem S. Tashkinov @ 2013-10-25 19:40 ` Diego Calleja 2013-10-25 23:32 ` Fengguang Wu 2013-10-25 20:43 ` NeilBrown 2013-10-29 20:49 ` Jan Kara 2 siblings, 1 reply; 21+ messages in thread From: Diego Calleja @ 2013-10-25 19:40 UTC (permalink / raw) To: Artem S. Tashkinov Cc: david, neilb, linux-kernel, torvalds, linux-fsdevel, axboe, linux-mm El Viernes, 25 de octubre de 2013 18:26:23 Artem S. Tashkinov escribió: > Oct 25, 2013 05:26:45 PM, david wrote: > >actually, I think the problem is more the impact of the huge write later > >on. > Exactly. And not being able to use applications which show you IO > performance like Midnight Commander. You might prefer to use "cp -a" but I > cannot imagine my life without being able to see the progress of a copying > operation. With the current dirty cache there's no way to understand how > you storage media actually behaves. This is a problem I also have been suffering for a long time. It's not so much how much and when the systems syncs dirty data, but how unreponsive the desktop becomes when it happens (usually, with rsync + large files). Most programs become completely unreponsive, specially if they have a large memory consumption (ie. the browser). I need to pause rsync and wait until the systems writes out all dirty data if I want to do simple things like scrolling or do any action that uses I/O, otherwise I need to wait minutes. I have 16 GB of RAM and excluding the browser (which usually uses about half of a GB) and KDE itself, there are no memory hogs, so it seem like it's something that shouldn't happen. I can understand that I/O operations are laggy when there is some other intensive I/O ongoing, but right now the system becomes completely unreponsive. If I am unlucky and Konsole also becomes unreponsive, I need to switch to a VT (which also takes time). I haven't reported it before in part because I didn't know how to do it, "my browser stalls" is not a very useful description and I didn't know what kind of data I'm supposed to report. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Disabling in-memory write cache for x86-64 in Linux II 2013-10-25 19:40 ` Diego Calleja @ 2013-10-25 23:32 ` Fengguang Wu 2013-11-15 15:48 ` Diego Calleja 0 siblings, 1 reply; 21+ messages in thread From: Fengguang Wu @ 2013-10-25 23:32 UTC (permalink / raw) To: Diego Calleja Cc: Artem S. Tashkinov, david, neilb, linux-kernel, torvalds, linux-fsdevel, axboe, linux-mm On Fri, Oct 25, 2013 at 09:40:13PM +0200, Diego Calleja wrote: > El Viernes, 25 de octubre de 2013 18:26:23 Artem S. Tashkinov escribió: > > Oct 25, 2013 05:26:45 PM, david wrote: > > >actually, I think the problem is more the impact of the huge write later > > >on. > > Exactly. And not being able to use applications which show you IO > > performance like Midnight Commander. You might prefer to use "cp -a" but I > > cannot imagine my life without being able to see the progress of a copying > > operation. With the current dirty cache there's no way to understand how > > you storage media actually behaves. > > > This is a problem I also have been suffering for a long time. It's not so much > how much and when the systems syncs dirty data, but how unreponsive the > desktop becomes when it happens (usually, with rsync + large files). Most > programs become completely unreponsive, specially if they have a large memory > consumption (ie. the browser). I need to pause rsync and wait until the > systems writes out all dirty data if I want to do simple things like scrolling > or do any action that uses I/O, otherwise I need to wait minutes. That's a problem. And it's kind of independent of the dirty threshold -- if you are doing large file copies in the background, it will lead to continuous disk writes and stalls anyway -- the large dirty threshold merely delays the write IO time. > I have 16 GB of RAM and excluding the browser (which usually uses about half > of a GB) and KDE itself, there are no memory hogs, so it seem like it's > something that shouldn't happen. I can understand that I/O operations are > laggy when there is some other intensive I/O ongoing, but right now the system > becomes completely unreponsive. If I am unlucky and Konsole also becomes > unreponsive, I need to switch to a VT (which also takes time). > > I haven't reported it before in part because I didn't know how to do it, "my > browser stalls" is not a very useful description and I didn't know what kind > of data I'm supposed to report. What's the kernel you are running? And it's writing to a hard disk? The stalls are most likely caused by either one of 1) write IO starves read IO 2) direct page reclaim blocked when - trying to writeout PG_dirty pages - trying to lock PG_writeback pages Which may be confirmed by running ps -eo ppid,pid,user,stat,pcpu,comm,wchan:32 or echo w > /proc/sysrq-trigger # and check dmesg during the stalls. The latter command works more reliably. Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Disabling in-memory write cache for x86-64 in Linux II 2013-10-25 23:32 ` Fengguang Wu @ 2013-11-15 15:48 ` Diego Calleja 0 siblings, 0 replies; 21+ messages in thread From: Diego Calleja @ 2013-11-15 15:48 UTC (permalink / raw) To: Fengguang Wu Cc: Artem S. Tashkinov, david, neilb, linux-kernel, torvalds, linux-fsdevel, axboe, linux-mm El Sábado, 26 de octubre de 2013 00:32:25 Fengguang Wu escribió: > What's the kernel you are running? And it's writing to a hard disk? > The stalls are most likely caused by either one of > > 1) write IO starves read IO > 2) direct page reclaim blocked when > - trying to writeout PG_dirty pages > - trying to lock PG_writeback pages > > Which may be confirmed by running > > ps -eo ppid,pid,user,stat,pcpu,comm,wchan:32 > or > echo w > /proc/sysrq-trigger # and check dmesg > > during the stalls. The latter command works more reliably. Sorry for the delay (background: rsync'ing large files from/to a hard disk in a desktop with 16GB of RAM makes the whole desktop unreponsive) I just triggered it today (running 3.12), and run sysrq-w: [ 5547.001505] SysRq : Show Blocked State [ 5547.001509] task PC stack pid father [ 5547.001516] btrfs-transacti D ffff880425d7a8a0 0 193 2 0x00000000 [ 5547.001519] ffff880425eede10 0000000000000002 ffff880425eedfd8 0000000000012e40 [ 5547.001521] ffff880425eedfd8 0000000000012e40 ffff880425d7a8a0 ffffea00104baa80 [ 5547.001523] ffff880425eedd90 ffff880425eedd68 ffff880425eedd70 ffffffff81080edd [ 5547.001525] Call Trace: [ 5547.001530] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.001533] [<ffffffff81560ed9>] ? sub_preempt_count+0x49/0x50 [ 5547.001535] [<ffffffff8155cc86>] ? _raw_spin_unlock+0x16/0x40 [ 5547.001552] [<ffffffffa008a742>] ? btrfs_run_ordered_operations+0x212/0x2c0 [btrfs] [ 5547.001554] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.001556] [<ffffffff81560ed9>] ? sub_preempt_count+0x49/0x50 [ 5547.001557] [<ffffffff8155d006>] ? _raw_spin_unlock_irqrestore+0x26/0x60 [ 5547.001559] [<ffffffff8155b719>] schedule+0x29/0x70 [ 5547.001566] [<ffffffffa0072215>] btrfs_commit_transaction+0x265/0x9d0 [btrfs] [ 5547.001569] [<ffffffff81073d20>] ? wake_up_atomic_t+0x30/0x30 [ 5547.001575] [<ffffffffa006982d>] transaction_kthread+0x19d/0x220 [btrfs] [ 5547.001581] [<ffffffffa0069690>] ? free_fs_root+0xc0/0xc0 [btrfs] [ 5547.001583] [<ffffffff81072e70>] kthread+0xc0/0xd0 [ 5547.001585] [<ffffffff81072db0>] ? kthread_create_on_node+0x120/0x120 [ 5547.001587] [<ffffffff81564bac>] ret_from_fork+0x7c/0xb0 [ 5547.001588] [<ffffffff81072db0>] ? kthread_create_on_node+0x120/0x120 [ 5547.001590] systemd-journal D ffff880426e19860 0 234 1 0x00000000 [ 5547.001592] ffff880426d77d90 0000000000000002 ffff880426d77fd8 0000000000012e40 [ 5547.001593] ffff880426d77fd8 0000000000012e40 ffff880426e19860 ffffffff8155d7cd [ 5547.001595] 0000000000000001 0000000000000001 0000000000000000 ffffffff81572560 [ 5547.001596] Call Trace: [ 5547.001598] [<ffffffff8155d7cd>] ? retint_restore_args+0xe/0xe [ 5547.001601] [<ffffffff8122b47b>] ? queue_unplugged+0x3b/0xe0 [ 5547.001602] [<ffffffff8122da9b>] ? blk_flush_plug_list+0x1eb/0x230 [ 5547.001604] [<ffffffff8155b719>] schedule+0x29/0x70 [ 5547.001606] [<ffffffff8155bb88>] schedule_preempt_disabled+0x18/0x30 [ 5547.001607] [<ffffffff8155a2f4>] __mutex_lock_slowpath+0x124/0x1f0 [ 5547.001613] [<ffffffffa0071c9b>] ? btrfs_write_marked_extents+0xbb/0xe0 [btrfs] [ 5547.001615] [<ffffffff8155a3d7>] mutex_lock+0x17/0x30 [ 5547.001623] [<ffffffffa00ae06a>] btrfs_sync_log+0x22a/0x690 [btrfs] [ 5547.001630] [<ffffffffa0082f47>] btrfs_sync_file+0x287/0x2e0 [btrfs] [ 5547.001632] [<ffffffff811abb96>] do_fsync+0x56/0x80 [ 5547.001634] [<ffffffff811abe20>] SyS_fsync+0x10/0x20 [ 5547.001635] [<ffffffff81564e5f>] tracesys+0xdd/0xe2 [ 5547.001644] mysqld D ffff8803f0901860 0 643 579 0x00000000 [ 5547.001645] ffff8803f090de18 0000000000000002 ffff8803f090dfd8 0000000000012e40 [ 5547.001647] ffff8803f090dfd8 0000000000012e40 ffff8803f0901860 ffff88016d038000 [ 5547.001648] ffff880426908d00 0000000024119d80 0000000000000000 0000000000000000 [ 5547.001650] Call Trace: [ 5547.001657] [<ffffffffa0074d14>] ? btrfs_submit_bio_hook+0x84/0x1f0 [btrfs] [ 5547.001659] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.001660] [<ffffffff81560ed9>] ? sub_preempt_count+0x49/0x50 [ 5547.001662] [<ffffffff8155d006>] ? _raw_spin_unlock_irqrestore+0x26/0x60 [ 5547.001663] [<ffffffff8155b719>] schedule+0x29/0x70 [ 5547.001669] [<ffffffffa007170f>] wait_current_trans.isra.17+0xbf/0x120 [btrfs] [ 5547.001671] [<ffffffff81073d20>] ? wake_up_atomic_t+0x30/0x30 [ 5547.001677] [<ffffffffa0072cff>] start_transaction+0x37f/0x570 [btrfs] [ 5547.001680] [<ffffffff8112632e>] ? do_writepages+0x1e/0x40 [ 5547.001686] [<ffffffffa0072f0b>] btrfs_start_transaction+0x1b/0x20 [btrfs] [ 5547.001693] [<ffffffffa0082e3f>] btrfs_sync_file+0x17f/0x2e0 [btrfs] [ 5547.001694] [<ffffffff811abb96>] do_fsync+0x56/0x80 [ 5547.001696] [<ffffffff811abe43>] SyS_fdatasync+0x13/0x20 [ 5547.001697] [<ffffffff81564e5f>] tracesys+0xdd/0xe2 [ 5547.001701] virtuoso-t D ffff88000310b0c0 0 617 609 0x00000000 [ 5547.001702] ffff8803f4867c20 0000000000000002 ffff8803f4867fd8 0000000000012e40 [ 5547.001704] ffff8803f4867fd8 0000000000012e40 ffff88000310b0c0 ffffffff813ce4af [ 5547.001705] ffffffff81860520 ffff8802d8ad8a00 ffff8803f4867ba0 ffffffff81231a0e [ 5547.001707] Call Trace: [ 5547.001709] [<ffffffff813ce4af>] ? scsi_pool_alloc_command+0x3f/0x80 [ 5547.001712] [<ffffffff81231a0e>] ? __blk_segment_map_sg+0x4e/0x120 [ 5547.001713] [<ffffffff81231b6b>] ? blk_rq_map_sg+0x8b/0x1f0 [ 5547.001716] [<ffffffff812481da>] ? cfq_dispatch_requests+0xba/0xc40 [ 5547.001718] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.001721] [<ffffffff81119d70>] ? filemap_fdatawait+0x30/0x30 [ 5547.001722] [<ffffffff8155b719>] schedule+0x29/0x70 [ 5547.001723] [<ffffffff8155b9bf>] io_schedule+0x8f/0xe0 [ 5547.001725] [<ffffffff81119d7e>] sleep_on_page+0xe/0x20 [ 5547.001727] [<ffffffff81559142>] __wait_on_bit+0x62/0x90 [ 5547.001728] [<ffffffff81119b2f>] wait_on_page_bit+0x7f/0x90 [ 5547.001730] [<ffffffff81073da0>] ? wake_atomic_t_function+0x40/0x40 [ 5547.001732] [<ffffffff81119cbb>] filemap_fdatawait_range+0x11b/0x1a0 [ 5547.001734] [<ffffffff8155cc86>] ? _raw_spin_unlock+0x16/0x40 [ 5547.001740] [<ffffffffa0071d47>] btrfs_wait_marked_extents+0x87/0xe0 [btrfs] [ 5547.001747] [<ffffffffa00ae328>] btrfs_sync_log+0x4e8/0x690 [btrfs] [ 5547.001754] [<ffffffffa0082f47>] btrfs_sync_file+0x287/0x2e0 [btrfs] [ 5547.001756] [<ffffffff811abb96>] do_fsync+0x56/0x80 [ 5547.001758] [<ffffffff811abe20>] SyS_fsync+0x10/0x20 [ 5547.001759] [<ffffffff81564e5f>] tracesys+0xdd/0xe2 [ 5547.001761] pool D ffff88040db1c100 0 657 477 0x00000000 [ 5547.001763] ffff8803ee809ba0 0000000000000002 ffff8803ee809fd8 0000000000012e40 [ 5547.001764] ffff8803ee809fd8 0000000000012e40 ffff88040db1c100 0000000000000004 [ 5547.001766] ffff8803ee809ae8 ffffffff8155cc86 ffff8803ee809bd0 ffffffffa005ada4 [ 5547.001767] Call Trace: [ 5547.001769] [<ffffffff8155cc86>] ? _raw_spin_unlock+0x16/0x40 [ 5547.001775] [<ffffffffa005ada4>] ? reserve_metadata_bytes+0x184/0x930 [btrfs] [ 5547.001776] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.001778] [<ffffffff81560ed9>] ? sub_preempt_count+0x49/0x50 [ 5547.001779] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.001781] [<ffffffff81560ed9>] ? sub_preempt_count+0x49/0x50 [ 5547.001783] [<ffffffff8155d006>] ? _raw_spin_unlock_irqrestore+0x26/0x60 [ 5547.001784] [<ffffffff8155b719>] schedule+0x29/0x70 [ 5547.001790] [<ffffffffa007170f>] wait_current_trans.isra.17+0xbf/0x120 [btrfs] [ 5547.001792] [<ffffffff81073d20>] ? wake_up_atomic_t+0x30/0x30 [ 5547.001798] [<ffffffffa0072cff>] start_transaction+0x37f/0x570 [btrfs] [ 5547.001804] [<ffffffffa0072f0b>] btrfs_start_transaction+0x1b/0x20 [btrfs] [ 5547.001810] [<ffffffffa0080b8b>] btrfs_create+0x3b/0x200 [btrfs] [ 5547.001813] [<ffffffff8120ce3c>] ? security_inode_permission+0x1c/0x30 [ 5547.001815] [<ffffffff81189634>] vfs_create+0xb4/0x120 [ 5547.001817] [<ffffffff8118bcd4>] do_last+0x904/0xea0 [ 5547.001818] [<ffffffff81188cc0>] ? link_path_walk+0x70/0x930 [ 5547.001820] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.001822] [<ffffffff8120d0e6>] ? security_file_alloc+0x16/0x20 [ 5547.001824] [<ffffffff8118c32b>] path_openat+0xbb/0x6b0 [ 5547.001827] [<ffffffff810dd64f>] ? __acct_update_integrals+0x7f/0x100 [ 5547.001829] [<ffffffff81085782>] ? account_system_time+0xa2/0x180 [ 5547.001831] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.001833] [<ffffffff8118d7ca>] do_filp_open+0x3a/0x90 [ 5547.001834] [<ffffffff8155cc86>] ? _raw_spin_unlock+0x16/0x40 [ 5547.001836] [<ffffffff81199e47>] ? __alloc_fd+0xa7/0x130 [ 5547.001839] [<ffffffff8117ce89>] do_sys_open+0x129/0x220 [ 5547.001842] [<ffffffff8100e795>] ? syscall_trace_enter+0x135/0x230 [ 5547.001844] [<ffffffff8117cf9e>] SyS_open+0x1e/0x20 [ 5547.001845] [<ffffffff81564e5f>] tracesys+0xdd/0xe2 [ 5547.001850] akregator D ffff8803ed1d4100 0 875 1 0x00000000 [ 5547.001851] ffff8803c7f1bba0 0000000000000002 ffff8803c7f1bfd8 0000000000012e40 [ 5547.001853] ffff8803c7f1bfd8 0000000000012e40 ffff8803ed1d4100 0000000000000004 [ 5547.001854] ffff8803c7f1bae8 ffffffff8155cc86 ffff8803c7f1bbd0 ffffffffa005ada4 [ 5547.001856] Call Trace: [ 5547.001858] [<ffffffff8155cc86>] ? _raw_spin_unlock+0x16/0x40 [ 5547.001863] [<ffffffffa005ada4>] ? reserve_metadata_bytes+0x184/0x930 [btrfs] [ 5547.001865] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.001866] [<ffffffff81560ed9>] ? sub_preempt_count+0x49/0x50 [ 5547.001868] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.001870] [<ffffffff81560ed9>] ? sub_preempt_count+0x49/0x50 [ 5547.001871] [<ffffffff8155d006>] ? _raw_spin_unlock_irqrestore+0x26/0x60 [ 5547.001873] [<ffffffff8155b719>] schedule+0x29/0x70 [ 5547.001879] [<ffffffffa007170f>] wait_current_trans.isra.17+0xbf/0x120 [btrfs] [ 5547.001881] [<ffffffff81073d20>] ? wake_up_atomic_t+0x30/0x30 [ 5547.001886] [<ffffffffa0072cff>] start_transaction+0x37f/0x570 [btrfs] [ 5547.001888] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.001894] [<ffffffffa0072f0b>] btrfs_start_transaction+0x1b/0x20 [btrfs] [ 5547.001900] [<ffffffffa0080b8b>] btrfs_create+0x3b/0x200 [btrfs] [ 5547.001902] [<ffffffff8120ce3c>] ? security_inode_permission+0x1c/0x30 [ 5547.001904] [<ffffffff81189634>] vfs_create+0xb4/0x120 [ 5547.001906] [<ffffffff8118bcd4>] do_last+0x904/0xea0 [ 5547.001907] [<ffffffff81188cc0>] ? link_path_walk+0x70/0x930 [ 5547.001909] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.001911] [<ffffffff8120d0e6>] ? security_file_alloc+0x16/0x20 [ 5547.001912] [<ffffffff8118c32b>] path_openat+0xbb/0x6b0 [ 5547.001914] [<ffffffff810dd64f>] ? __acct_update_integrals+0x7f/0x100 [ 5547.001916] [<ffffffff81085782>] ? account_system_time+0xa2/0x180 [ 5547.001918] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.001920] [<ffffffff8118d7ca>] do_filp_open+0x3a/0x90 [ 5547.001921] [<ffffffff8155cc86>] ? _raw_spin_unlock+0x16/0x40 [ 5547.001923] [<ffffffff81199e47>] ? __alloc_fd+0xa7/0x130 [ 5547.001925] [<ffffffff8117ce89>] do_sys_open+0x129/0x220 [ 5547.001927] [<ffffffff8100e795>] ? syscall_trace_enter+0x135/0x230 [ 5547.001928] [<ffffffff8117cf9e>] SyS_open+0x1e/0x20 [ 5547.001930] [<ffffffff81564e5f>] tracesys+0xdd/0xe2 [ 5547.001931] mpegaudioparse3 D ffff880341d10820 0 5917 1 0x00000000 [ 5547.001933] ffff88030f779ce0 0000000000000002 ffff88030f779fd8 0000000000012e40 [ 5547.001934] ffff88030f779fd8 0000000000012e40 ffff880341d10820 ffffffff81122a28 [ 5547.001936] ffff88043e5ddc00 ffff880400000002 ffff88043e2138d0 0000000000000000 [ 5547.001938] Call Trace: [ 5547.001939] [<ffffffff81122a28>] ? __alloc_pages_nodemask+0x158/0xb00 [ 5547.001941] [<ffffffff8102af55>] ? native_send_call_func_single_ipi+0x35/0x40 [ 5547.001943] [<ffffffff810b31a8>] ? generic_exec_single+0x98/0xa0 [ 5547.001945] [<ffffffff81086a18>] ? __enqueue_entity+0x78/0x80 [ 5547.001947] [<ffffffff8108a837>] ? enqueue_entity+0x197/0x780 [ 5547.001948] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.001950] [<ffffffff81119d90>] ? sleep_on_page+0x20/0x20 [ 5547.001951] [<ffffffff8155b719>] schedule+0x29/0x70 [ 5547.001953] [<ffffffff8155b9bf>] io_schedule+0x8f/0xe0 [ 5547.001954] [<ffffffff81119d9e>] sleep_on_page_killable+0xe/0x40 [ 5547.001956] [<ffffffff8155925d>] __wait_on_bit_lock+0x5d/0xc0 [ 5547.001958] [<ffffffff81119f2a>] __lock_page_killable+0x6a/0x70 [ 5547.001960] [<ffffffff81073da0>] ? wake_atomic_t_function+0x40/0x40 [ 5547.001961] [<ffffffff8111b9e5>] generic_file_aio_read+0x435/0x700 [ 5547.001963] [<ffffffff8117d2ba>] do_sync_read+0x5a/0x90 [ 5547.001965] [<ffffffff8117d85a>] vfs_read+0x9a/0x170 [ 5547.001967] [<ffffffff8117e039>] SyS_read+0x49/0xa0 [ 5547.001968] [<ffffffff81564e5f>] tracesys+0xdd/0xe2 [ 5547.001970] mozStorage #2 D ffff8803b7aa1860 0 920 477 0x00000000 [ 5547.001972] ffff8803b1473d80 0000000000000002 ffff8803b1473fd8 0000000000012e40 [ 5547.001974] ffff8803b1473fd8 0000000000012e40 ffff8803b7aa1860 0000000000000004 [ 5547.001975] ffff8803b1473cc8 ffffffff8155cc86 ffff8803b1473db0 ffffffffa005ada4 [ 5547.001977] Call Trace: [ 5547.001978] [<ffffffff8155cc86>] ? _raw_spin_unlock+0x16/0x40 [ 5547.001984] [<ffffffffa005ada4>] ? reserve_metadata_bytes+0x184/0x930 [btrfs] [ 5547.001990] [<ffffffffa0084729>] ? __btrfs_buffered_write+0x3d9/0x490 [btrfs] [ 5547.001992] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.001994] [<ffffffff81560ed9>] ? sub_preempt_count+0x49/0x50 [ 5547.001995] [<ffffffff8155d006>] ? _raw_spin_unlock_irqrestore+0x26/0x60 [ 5547.001997] [<ffffffff8155b719>] schedule+0x29/0x70 [ 5547.002003] [<ffffffffa007170f>] wait_current_trans.isra.17+0xbf/0x120 [btrfs] [ 5547.002004] [<ffffffff81073d20>] ? wake_up_atomic_t+0x30/0x30 [ 5547.002010] [<ffffffffa0072cff>] start_transaction+0x37f/0x570 [btrfs] [ 5547.002016] [<ffffffffa0072f0b>] btrfs_start_transaction+0x1b/0x20 [btrfs] [ 5547.002023] [<ffffffffa007c8a1>] btrfs_setattr+0x101/0x290 [btrfs] [ 5547.002025] [<ffffffff810d675c>] ? rcu_eqs_enter+0x5c/0xa0 [ 5547.002027] [<ffffffff81198a6c>] notify_change+0x1dc/0x360 [ 5547.002029] [<ffffffff81560ed9>] ? sub_preempt_count+0x49/0x50 [ 5547.002030] [<ffffffff8117bdcb>] do_truncate+0x6b/0xa0 [ 5547.002032] [<ffffffff8117f8b9>] ? __sb_start_write+0x49/0x100 [ 5547.002033] [<ffffffff8117c12b>] SyS_ftruncate+0x10b/0x160 [ 5547.002035] [<ffffffff81564e5f>] tracesys+0xdd/0xe2 [ 5547.002036] Cache I/O D ffff8803b7aa28a0 0 922 477 0x00000000 [ 5547.002038] ffff8803b1495e18 0000000000000002 ffff8803b1495fd8 0000000000012e40 [ 5547.002039] ffff8803b1495fd8 0000000000012e40 ffff8803b7aa28a0 ffff8803b1495e08 [ 5547.002041] ffff8803b1495db0 ffffffff8111a25a ffff8803b1495e40 ffff8803b1495df0 [ 5547.002043] Call Trace: [ 5547.002045] [<ffffffff8111a25a>] ? find_get_pages_tag+0xea/0x180 [ 5547.002047] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.002048] [<ffffffff81560ed9>] ? sub_preempt_count+0x49/0x50 [ 5547.002050] [<ffffffff8155d006>] ? _raw_spin_unlock_irqrestore+0x26/0x60 [ 5547.002051] [<ffffffff8155b719>] schedule+0x29/0x70 [ 5547.002057] [<ffffffffa007170f>] wait_current_trans.isra.17+0xbf/0x120 [btrfs] [ 5547.002059] [<ffffffff81073d20>] ? wake_up_atomic_t+0x30/0x30 [ 5547.002065] [<ffffffffa0072cff>] start_transaction+0x37f/0x570 [btrfs] [ 5547.002071] [<ffffffffa0072f0b>] btrfs_start_transaction+0x1b/0x20 [btrfs] [ 5547.002077] [<ffffffffa0082e3f>] btrfs_sync_file+0x17f/0x2e0 [btrfs] [ 5547.002079] [<ffffffff811abb96>] do_fsync+0x56/0x80 [ 5547.002080] [<ffffffff811abe20>] SyS_fsync+0x10/0x20 [ 5547.002081] [<ffffffff81564e5f>] tracesys+0xdd/0xe2 [ 5547.002083] mozStorage #6 D ffff8803c0cfa8a0 0 982 477 0x00000000 [ 5547.002085] ffff8803a10f5ba0 0000000000000002 ffff8803a10f5fd8 0000000000012e40 [ 5547.002086] ffff8803a10f5fd8 0000000000012e40 ffff8803c0cfa8a0 0000000000000004 [ 5547.002088] ffff8803a10f5ae8 ffffffff8155cc86 ffff8803a10f5bd0 ffffffffa005ada4 [ 5547.002089] Call Trace: [ 5547.002091] [<ffffffff8155cc86>] ? _raw_spin_unlock+0x16/0x40 [ 5547.002096] [<ffffffffa005ada4>] ? reserve_metadata_bytes+0x184/0x930 [btrfs] [ 5547.002098] [<ffffffff8102b067>] ? native_smp_send_reschedule+0x47/0x60 [ 5547.002100] [<ffffffff8107f7bc>] ? resched_task+0x5c/0x60 [ 5547.002101] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.002103] [<ffffffff81560ed9>] ? sub_preempt_count+0x49/0x50 [ 5547.002104] [<ffffffff8155d006>] ? _raw_spin_unlock_irqrestore+0x26/0x60 [ 5547.002106] [<ffffffff8155b719>] schedule+0x29/0x70 [ 5547.002112] [<ffffffffa007170f>] wait_current_trans.isra.17+0xbf/0x120 [btrfs] [ 5547.002113] [<ffffffff81073d20>] ? wake_up_atomic_t+0x30/0x30 [ 5547.002119] [<ffffffffa0072cff>] start_transaction+0x37f/0x570 [btrfs] [ 5547.002125] [<ffffffffa0072f0b>] btrfs_start_transaction+0x1b/0x20 [btrfs] [ 5547.002131] [<ffffffffa0080b8b>] btrfs_create+0x3b/0x200 [btrfs] [ 5547.002133] [<ffffffff8120ce3c>] ? security_inode_permission+0x1c/0x30 [ 5547.002134] [<ffffffff81189634>] vfs_create+0xb4/0x120 [ 5547.002136] [<ffffffff8118bcd4>] do_last+0x904/0xea0 [ 5547.002138] [<ffffffff81188cc0>] ? link_path_walk+0x70/0x930 [ 5547.002139] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.002141] [<ffffffff8120d0e6>] ? security_file_alloc+0x16/0x20 [ 5547.002143] [<ffffffff8118c32b>] path_openat+0xbb/0x6b0 [ 5547.002145] [<ffffffff810dd64f>] ? __acct_update_integrals+0x7f/0x100 [ 5547.002147] [<ffffffff81085782>] ? account_system_time+0xa2/0x180 [ 5547.002148] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.002150] [<ffffffff8118d7ca>] do_filp_open+0x3a/0x90 [ 5547.002152] [<ffffffff8155cc86>] ? _raw_spin_unlock+0x16/0x40 [ 5547.002153] [<ffffffff81199e47>] ? __alloc_fd+0xa7/0x130 [ 5547.002155] [<ffffffff8117ce89>] do_sys_open+0x129/0x220 [ 5547.002157] [<ffffffff8100e795>] ? syscall_trace_enter+0x135/0x230 [ 5547.002159] [<ffffffff8117cf9e>] SyS_open+0x1e/0x20 [ 5547.002160] [<ffffffff81564e5f>] tracesys+0xdd/0xe2 [ 5547.002164] rsync D ffff8802dcde0820 0 5803 5802 0x00000000 [ 5547.002165] ffff8802daeb1a90 0000000000000002 ffff8802daeb1fd8 0000000000012e40 [ 5547.002167] ffff8802daeb1fd8 0000000000012e40 ffff8802dcde0820 ffff880100000002 [ 5547.002169] ffff8802daeb19e0 ffffffff81080edd ffff880308b337e0 0000000000000000 [ 5547.002170] Call Trace: [ 5547.002172] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.002173] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.002175] [<ffffffff81560ed9>] ? sub_preempt_count+0x49/0x50 [ 5547.002177] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.002178] [<ffffffff81560e8d>] ? add_preempt_count+0x3d/0x40 [ 5547.002180] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.002181] [<ffffffff8155b719>] schedule+0x29/0x70 [ 5547.002182] [<ffffffff81558f6a>] schedule_timeout+0x11a/0x230 [ 5547.002185] [<ffffffff8105e0c0>] ? detach_if_pending+0x120/0x120 [ 5547.002187] [<ffffffff810a5078>] ? ktime_get_ts+0x48/0xe0 [ 5547.002189] [<ffffffff8155bd2b>] io_schedule_timeout+0x9b/0xf0 [ 5547.002191] [<ffffffff811259a9>] balance_dirty_pages_ratelimited+0x3d9/0xa10 [ 5547.002198] [<ffffffffa0c9ad84>] ? ext4_dirty_inode+0x54/0x60 [ext4] [ 5547.002200] [<ffffffff8111a8c8>] generic_file_buffered_write+0x1b8/0x290 [ 5547.002202] [<ffffffff8111bfd9>] __generic_file_aio_write+0x1a9/0x3b0 [ 5547.002203] [<ffffffff8111c238>] generic_file_aio_write+0x58/0xa0 [ 5547.002208] [<ffffffffa0c8ef79>] ext4_file_write+0x99/0x3e0 [ext4] [ 5547.002210] [<ffffffff810ddaac>] ? acct_account_cputime+0x1c/0x20 [ 5547.002212] [<ffffffff81085782>] ? account_system_time+0xa2/0x180 [ 5547.002213] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.002215] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.002216] [<ffffffff8117d34a>] do_sync_write+0x5a/0x90 [ 5547.002218] [<ffffffff8117d9ed>] vfs_write+0xbd/0x1e0 [ 5547.002220] [<ffffffff8117e0d9>] SyS_write+0x49/0xa0 [ 5547.002221] [<ffffffff81564e5f>] tracesys+0xdd/0xe2 [ 5547.002223] ktorrent D ffff8802e7680820 0 5806 1 0x00000000 [ 5547.002224] ffff8802daf7fba0 0000000000000002 ffff8802daf7ffd8 0000000000012e40 [ 5547.002226] ffff8802daf7ffd8 0000000000012e40 ffff8802e7680820 0000000000000004 [ 5547.002227] ffff8802daf7fae8 ffffffff8155cc86 ffff8802daf7fbd0 ffffffffa005ada4 [ 5547.002229] Call Trace: [ 5547.002230] [<ffffffff8155cc86>] ? _raw_spin_unlock+0x16/0x40 [ 5547.002236] [<ffffffffa005ada4>] ? reserve_metadata_bytes+0x184/0x930 [btrfs] [ 5547.002241] [<ffffffffa004ae49>] ? btrfs_set_path_blocking+0x39/0x80 [btrfs] [ 5547.002246] [<ffffffffa004fe78>] ? btrfs_search_slot+0x498/0x970 [btrfs] [ 5547.002247] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.002249] [<ffffffff81560ed9>] ? sub_preempt_count+0x49/0x50 [ 5547.002251] [<ffffffff8155d006>] ? _raw_spin_unlock_irqrestore+0x26/0x60 [ 5547.002252] [<ffffffff8155b719>] schedule+0x29/0x70 [ 5547.002258] [<ffffffffa007170f>] wait_current_trans.isra.17+0xbf/0x120 [btrfs] [ 5547.002260] [<ffffffff81073d20>] ? wake_up_atomic_t+0x30/0x30 [ 5547.002266] [<ffffffffa0072cff>] start_transaction+0x37f/0x570 [btrfs] [ 5547.002268] [<ffffffff81560ed9>] ? sub_preempt_count+0x49/0x50 [ 5547.002273] [<ffffffffa0072f0b>] btrfs_start_transaction+0x1b/0x20 [btrfs] [ 5547.002280] [<ffffffffa0080b8b>] btrfs_create+0x3b/0x200 [btrfs] [ 5547.002281] [<ffffffff8120ce3c>] ? security_inode_permission+0x1c/0x30 [ 5547.002283] [<ffffffff81189634>] vfs_create+0xb4/0x120 [ 5547.002285] [<ffffffff8118bcd4>] do_last+0x904/0xea0 [ 5547.002287] [<ffffffff81188cc0>] ? link_path_walk+0x70/0x930 [ 5547.002288] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.002290] [<ffffffff8120d0e6>] ? security_file_alloc+0x16/0x20 [ 5547.002292] [<ffffffff8118c32b>] path_openat+0xbb/0x6b0 [ 5547.002293] [<ffffffff810dd64f>] ? __acct_update_integrals+0x7f/0x100 [ 5547.002295] [<ffffffff81085782>] ? account_system_time+0xa2/0x180 [ 5547.002297] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.002299] [<ffffffff8118d7ca>] do_filp_open+0x3a/0x90 [ 5547.002300] [<ffffffff8155cc86>] ? _raw_spin_unlock+0x16/0x40 [ 5547.002302] [<ffffffff81199e47>] ? __alloc_fd+0xa7/0x130 [ 5547.002304] [<ffffffff8117ce89>] do_sys_open+0x129/0x220 [ 5547.002306] [<ffffffff8100e795>] ? syscall_trace_enter+0x135/0x230 [ 5547.002307] [<ffffffff8117cf9e>] SyS_open+0x1e/0x20 [ 5547.002309] [<ffffffff81564e5f>] tracesys+0xdd/0xe2 [ 5547.002311] kworker/u16:0 D ffff88035c5ac920 0 6043 2 0x00000000 [ 5547.002313] Workqueue: writeback bdi_writeback_workfn (flush-8:32) [ 5547.002315] ffff88036c9cb898 0000000000000002 ffff88036c9cbfd8 0000000000012e40 [ 5547.002316] ffff88036c9cbfd8 0000000000012e40 ffff88035c5ac920 ffff8804281de048 [ 5547.002318] ffff88036c9cb7e8 ffffffff81080edd 0000000000000001 ffff88036c9cb800 [ 5547.002319] Call Trace: [ 5547.002321] [<ffffffff81080edd>] ? get_parent_ip+0xd/0x50 [ 5547.002323] [<ffffffff81560ed9>] ? sub_preempt_count+0x49/0x50 [ 5547.002324] [<ffffffff8155cc86>] ? _raw_spin_unlock+0x16/0x40 [ 5547.002326] [<ffffffff8122b47b>] ? queue_unplugged+0x3b/0xe0 [ 5547.002328] [<ffffffff8155b719>] schedule+0x29/0x70 [ 5547.002329] [<ffffffff8155b9bf>] io_schedule+0x8f/0xe0 [ 5547.002331] [<ffffffff8122b8aa>] get_request+0x1aa/0x780 [ 5547.002332] [<ffffffff8123099e>] ? ioc_lookup_icq+0x4e/0x80 [ 5547.002334] [<ffffffff81073d20>] ? wake_up_atomic_t+0x30/0x30 [ 5547.002336] [<ffffffff8122db58>] blk_queue_bio+0x78/0x3e0 [ 5547.002337] [<ffffffff8122c5c2>] generic_make_request+0xc2/0x110 [ 5547.002338] [<ffffffff8122c683>] submit_bio+0x73/0x160 [ 5547.002344] [<ffffffffa0c9bae5>] ext4_io_submit+0x25/0x50 [ext4] [ 5547.002348] [<ffffffffa0c981d3>] ext4_writepages+0x823/0xe00 [ext4] [ 5547.002350] [<ffffffff8112632e>] do_writepages+0x1e/0x40 [ 5547.002352] [<ffffffff811a6340>] __writeback_single_inode+0x40/0x330 [ 5547.002353] [<ffffffff811a7392>] writeback_sb_inodes+0x262/0x450 [ 5547.002355] [<ffffffff811a761f>] __writeback_inodes_wb+0x9f/0xd0 [ 5547.002357] [<ffffffff811a797b>] wb_writeback+0x32b/0x360 [ 5547.002358] [<ffffffff811a8111>] bdi_writeback_workfn+0x221/0x510 [ 5547.002361] [<ffffffff8106b917>] process_one_work+0x167/0x450 [ 5547.002362] [<ffffffff8106c6a1>] worker_thread+0x121/0x3a0 [ 5547.002364] [<ffffffff81560ed9>] ? sub_preempt_count+0x49/0x50 [ 5547.002366] [<ffffffff8106c580>] ? manage_workers.isra.25+0x2a0/0x2a0 [ 5547.002367] [<ffffffff81072e70>] kthread+0xc0/0xd0 [ 5547.002369] [<ffffffff81072db0>] ? kthread_create_on_node+0x120/0x120 [ 5547.002371] [<ffffffff81564bac>] ret_from_fork+0x7c/0xb0 [ 5547.002372] [<ffffffff81072db0>] ? kthread_create_on_node+0x120/0x120 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Disabling in-memory write cache for x86-64 in Linux II 2013-10-25 18:26 ` Artem S. Tashkinov 2013-10-25 19:40 ` Diego Calleja @ 2013-10-25 20:43 ` NeilBrown 2013-10-25 21:03 ` Artem S. Tashkinov 2013-10-29 20:49 ` Jan Kara 2 siblings, 1 reply; 21+ messages in thread From: NeilBrown @ 2013-10-25 20:43 UTC (permalink / raw) To: Artem S. Tashkinov Cc: david, linux-kernel, torvalds, linux-fsdevel, axboe, linux-mm [-- Attachment #1: Type: text/plain, Size: 1836 bytes --] On Fri, 25 Oct 2013 18:26:23 +0000 (UTC) "Artem S. Tashkinov" <t.artem@lycos.com> wrote: > Oct 25, 2013 05:26:45 PM, david wrote: > On Fri, 25 Oct 2013, NeilBrown wrote: > > > >> > >> What exactly is bothering you about this? The amount of memory used or the > >> time until data is flushed? > > > >actually, I think the problem is more the impact of the huge write later on. > > Exactly. And not being able to use applications which show you IO performance > like Midnight Commander. You might prefer to use "cp -a" but I cannot imagine > my life without being able to see the progress of a copying operation. With the current > dirty cache there's no way to understand how you storage media actually behaves. So fix Midnight Commander. If you want the copy to be actually finished when it says it is finished, then it needs to call 'fsync()' at the end. > > Hopefully this issue won't dissolve into obscurity and someone will actually make > up a plan (and a patch) how to make dirty write cache behave in a sane manner > considering the fact that there are devices with very different write speeds and > requirements. It'd be ever better, if I could specify dirty cache as a mount option > (though sane defaults or semi-automatic values based on runtime estimates > won't hurt). > > Per device dirty cache seems like a nice idea, I, for one, would like to disable it > altogether or make it an absolute minimum for things like USB flash drives - because > I don't care about multithreaded performance or delayed allocation on such devices - > I'm interested in my data reaching my USB stick ASAP - because it's how most people > use them. > As has already been said, you can substantially disable the cache by tuning down various values in /proc/sys/vm/. Have you tried? NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Disabling in-memory write cache for x86-64 in Linux II 2013-10-25 20:43 ` NeilBrown @ 2013-10-25 21:03 ` Artem S. Tashkinov 2013-10-25 22:11 ` NeilBrown 0 siblings, 1 reply; 21+ messages in thread From: Artem S. Tashkinov @ 2013-10-25 21:03 UTC (permalink / raw) To: neilb; +Cc: david, linux-kernel, torvalds, linux-fsdevel, axboe, linux-mm Oct 26, 2013 02:44:07 AM, neil wrote: On Fri, 25 Oct 2013 18:26:23 +0000 (UTC) "Artem S. Tashkinov" >> >> Exactly. And not being able to use applications which show you IO performance >> like Midnight Commander. You might prefer to use "cp -a" but I cannot imagine >> my life without being able to see the progress of a copying operation. With the current >> dirty cache there's no way to understand how you storage media actually behaves. > >So fix Midnight Commander. If you want the copy to be actually finished when >it says it is finished, then it needs to call 'fsync()' at the end. This sounds like a very bad joke. How applications are supposed to show and calculate an _average_ write speed if there are no kernel calls/ioctls to actually make the kernel flush dirty buffers _during_ copying? Actually it's a good way to solve this problem in user space - alas, even if such calls are implemented, user space will start using them only in 2018 if not further from that. >> >> Per device dirty cache seems like a nice idea, I, for one, would like to disable it >> altogether or make it an absolute minimum for things like USB flash drives - because >> I don't care about multithreaded performance or delayed allocation on such devices - >> I'm interested in my data reaching my USB stick ASAP - because it's how most people >> use them. >> > >As has already been said, you can substantially disable the cache by tuning >down various values in /proc/sys/vm/. >Have you tried? I don't understand who you are replying to. I asked about per device settings, you are again referring me to system wide settings - they don't look that good if we're talking about a 3MB/sec flash drive and 500MB/sec SSD drive. Besides it makes no sense to allocate 20% of physical RAM for things which don't belong to it in the first place. I don't know any other OS which has a similar behaviour. And like people (including me) have already mentioned, such a huge dirty cache can stall their PCs/servers for a considerable amount of time. Of course, if you don't use Linux on the desktop you don't really care - well, I do. Also not everyone in this world has an UPS - which means such a huge buffer can lead to a serious data loss in case of a power blackout. Regards, Artem -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Disabling in-memory write cache for x86-64 in Linux II 2013-10-25 21:03 ` Artem S. Tashkinov @ 2013-10-25 22:11 ` NeilBrown 2013-11-05 1:40 ` Figo.zhang 0 siblings, 1 reply; 21+ messages in thread From: NeilBrown @ 2013-10-25 22:11 UTC (permalink / raw) To: Artem S. Tashkinov Cc: david, linux-kernel, torvalds, linux-fsdevel, axboe, linux-mm [-- Attachment #1: Type: text/plain, Size: 4860 bytes --] On Fri, 25 Oct 2013 21:03:44 +0000 (UTC) "Artem S. Tashkinov" <t.artem@lycos.com> wrote: > Oct 26, 2013 02:44:07 AM, neil wrote: > On Fri, 25 Oct 2013 18:26:23 +0000 (UTC) "Artem S. Tashkinov" > >> > >> Exactly. And not being able to use applications which show you IO performance > >> like Midnight Commander. You might prefer to use "cp -a" but I cannot imagine > >> my life without being able to see the progress of a copying operation. With the current > >> dirty cache there's no way to understand how you storage media actually behaves. > > > >So fix Midnight Commander. If you want the copy to be actually finished when > >it says it is finished, then it needs to call 'fsync()' at the end. > > This sounds like a very bad joke. How applications are supposed to show and > calculate an _average_ write speed if there are no kernel calls/ioctls to actually > make the kernel flush dirty buffers _during_ copying? Actually it's a good way to > solve this problem in user space - alas, even if such calls are implemented, user > space will start using them only in 2018 if not further from that. But there is a way to flush dirty buffers *during* copies. man 2 sync_file_range if giving precise feedback is is paramount importance to you, then this would be the interface to use. > > >> > >> Per device dirty cache seems like a nice idea, I, for one, would like to disable it > >> altogether or make it an absolute minimum for things like USB flash drives - because > >> I don't care about multithreaded performance or delayed allocation on such devices - > >> I'm interested in my data reaching my USB stick ASAP - because it's how most people > >> use them. > >> > > > >As has already been said, you can substantially disable the cache by tuning > >down various values in /proc/sys/vm/. > >Have you tried? > > I don't understand who you are replying to. I asked about per device settings, you are > again referring me to system wide settings - they don't look that good if we're talking > about a 3MB/sec flash drive and 500MB/sec SSD drive. Besides it makes no sense > to allocate 20% of physical RAM for things which don't belong to it in the first place. Sorry, missed the per-device bit. You could try playing with /sys/class/bdi/XX:YY/max_ratio where XX:YY is the major/minor number of the device, so 8:0 for /dev/sda. Wind it right down for slow devices and you might get something like what you want. > > I don't know any other OS which has a similar behaviour. I don't know about the internal details of any other OS, so I cannot really comment. > > And like people (including me) have already mentioned, such a huge dirty cache can > stall their PCs/servers for a considerable amount of time. Yes. But this is a different issue. There are two very different issues that should be kept separate. One is that when "cp" or similar complete, the data hasn't all be written out yet. It typically takes another 30 seconds before the flush will complete. You seemed to primarily complain about this, so that is what I originally address. That is where in the "dirty_*_centisecs" values apply. The other, quite separate, issue is that Linux will cache more dirty data than it can write out in a reasonable time. All the tuning parameters refer to the amount of data (whether as a percentage of RAM or as a number of bytes), but what people really care about is a number of seconds. As you might imagine, estimating how long it will take to write out a certain amount of data is highly non-trivial. The relationship between megabytes and seconds can be non-linear and can change over time. Caching nothing at all can hurt a lot of workloads. Caching too much can obviously hurt too. Caching "5 seconds" worth of data would be ideal, but would be incredibly difficult to implement. It is possible that keeping a sliding estimate of device throughput for each device would be possible, and using that to automatically adjust the "max_ratio" value (or some related internal thing) might be a 70% solution. Certainly it would be an interesting project for someone. > > Of course, if you don't use Linux on the desktop you don't really care - well, I do. Also > not everyone in this world has an UPS - which means such a huge buffer can lead to a > serious data loss in case of a power blackout. I don't have a desk (just a lap), but I use Linux on all my computers and I've never really noticed the problem. Maybe I'm just very patient, or maybe I don't work with large data sets and slow devices. However I don't think data-loss is really a related issue. Any process that cares about data safety *must* use fsync at appropriate places. This has always been true. NeilBrown > > Regards, > > Artem [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Disabling in-memory write cache for x86-64 in Linux II 2013-10-25 22:11 ` NeilBrown @ 2013-11-05 1:40 ` Figo.zhang 2013-11-05 1:47 ` David Lang 2013-11-05 2:08 ` NeilBrown 0 siblings, 2 replies; 21+ messages in thread From: Figo.zhang @ 2013-11-05 1:40 UTC (permalink / raw) To: NeilBrown Cc: Artem S. Tashkinov, david, lkml, Linus Torvalds, linux-fsdevel, axboe, Linux-MM [-- Attachment #1: Type: text/plain, Size: 882 bytes --] > > > > Of course, if you don't use Linux on the desktop you don't really care - > well, I do. Also > > not everyone in this world has an UPS - which means such a huge buffer > can lead to a > > serious data loss in case of a power blackout. > > I don't have a desk (just a lap), but I use Linux on all my computers and > I've never really noticed the problem. Maybe I'm just very patient, or > maybe > I don't work with large data sets and slow devices. > > However I don't think data-loss is really a related issue. Any process > that > cares about data safety *must* use fsync at appropriate places. This has > always been true. > > =>May i ask question that, some like ext4 filesystem, if some app motify the files, it create some dirty data. if some meta-data writing to the journal disk when a power backout, it will be lose some serious data and the the file will damage? [-- Attachment #2: Type: text/html, Size: 1204 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Disabling in-memory write cache for x86-64 in Linux II 2013-11-05 1:40 ` Figo.zhang @ 2013-11-05 1:47 ` David Lang 2013-11-05 2:08 ` NeilBrown 1 sibling, 0 replies; 21+ messages in thread From: David Lang @ 2013-11-05 1:47 UTC (permalink / raw) To: Figo.zhang Cc: NeilBrown, Artem S. Tashkinov, lkml, Linus Torvalds, linux-fsdevel, axboe, Linux-MM On Tue, 5 Nov 2013, Figo.zhang wrote: >>> >>> Of course, if you don't use Linux on the desktop you don't really care - >> well, I do. Also >>> not everyone in this world has an UPS - which means such a huge buffer >> can lead to a >>> serious data loss in case of a power blackout. >> >> I don't have a desk (just a lap), but I use Linux on all my computers and >> I've never really noticed the problem. Maybe I'm just very patient, or >> maybe >> I don't work with large data sets and slow devices. >> >> However I don't think data-loss is really a related issue. Any process >> that >> cares about data safety *must* use fsync at appropriate places. This has >> always been true. >> >> =>May i ask question that, some like ext4 filesystem, if some app motify > the files, it create some dirty data. if some meta-data writing to the > journal disk when a power backout, > it will be lose some serious data and the the file will damage? > with any filesystem and any OS, if you create dirty data but do not f*sync() the data, there isa possibility that the system can go down between the time the application creates the dirty data and the time the OS actually gets it on disk. If the system goes down in this timeframe, the data will be lost and it may corrupt the file if only some of the data got written. David Lang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Disabling in-memory write cache for x86-64 in Linux II 2013-11-05 1:40 ` Figo.zhang 2013-11-05 1:47 ` David Lang @ 2013-11-05 2:08 ` NeilBrown 1 sibling, 0 replies; 21+ messages in thread From: NeilBrown @ 2013-11-05 2:08 UTC (permalink / raw) To: Figo.zhang Cc: Artem S. Tashkinov, david, lkml, Linus Torvalds, linux-fsdevel, axboe, Linux-MM [-- Attachment #1: Type: text/plain, Size: 1632 bytes --] On Tue, 5 Nov 2013 09:40:55 +0800 "Figo.zhang" <figo1802@gmail.com> wrote: > > > > > > Of course, if you don't use Linux on the desktop you don't really care - > > well, I do. Also > > > not everyone in this world has an UPS - which means such a huge buffer > > can lead to a > > > serious data loss in case of a power blackout. > > > > I don't have a desk (just a lap), but I use Linux on all my computers and > > I've never really noticed the problem. Maybe I'm just very patient, or > > maybe > > I don't work with large data sets and slow devices. > > > > However I don't think data-loss is really a related issue. Any process > > that > > cares about data safety *must* use fsync at appropriate places. This has > > always been true. > > > > =>May i ask question that, some like ext4 filesystem, if some app motify > the files, it create some dirty data. if some meta-data writing to the > journal disk when a power backout, > it will be lose some serious data and the the file will damage? If you modify a file, then you must take care that you can recover from a crash at any point in the process. If the file is small, the usual approach is to create a copy of the file with the appropriate changes made, then 'fsync' the file and rename the new file over the old file. If the file is large you might need some sort of update log (in a small file) so you can replay recent updates after a crash. The journalling that the filesystem provides only protects the filesystem metadata. It does not protect the consistency of the data in your file. I hope that helps. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Disabling in-memory write cache for x86-64 in Linux II 2013-10-25 18:26 ` Artem S. Tashkinov 2013-10-25 19:40 ` Diego Calleja 2013-10-25 20:43 ` NeilBrown @ 2013-10-29 20:49 ` Jan Kara 2 siblings, 0 replies; 21+ messages in thread From: Jan Kara @ 2013-10-29 20:49 UTC (permalink / raw) To: Artem S. Tashkinov Cc: david, neilb, linux-kernel, torvalds, linux-fsdevel, axboe, linux-mm On Fri 25-10-13 18:26:23, Artem S. Tashkinov wrote: > Oct 25, 2013 05:26:45 PM, david wrote: > On Fri, 25 Oct 2013, NeilBrown wrote: > > > >> > >> What exactly is bothering you about this? The amount of memory used or the > >> time until data is flushed? > > > >actually, I think the problem is more the impact of the huge write later on. > > Exactly. And not being able to use applications which show you IO > performance like Midnight Commander. You might prefer to use "cp -a" but > I cannot imagine my life without being able to see the progress of a > copying operation. With the current dirty cache there's no way to > understand how you storage media actually behaves. Large writes shouldn't stall your desktop, that's certain and we must fix that. I don't find the problem with copy progress indicators that pressing... > Hopefully this issue won't dissolve into obscurity and someone will > actually make up a plan (and a patch) how to make dirty write cache > behave in a sane manner considering the fact that there are devices with > very different write speeds and requirements. It'd be ever better, if I > could specify dirty cache as a mount option (though sane defaults or > semi-automatic values based on runtime estimates won't hurt). > > Per device dirty cache seems like a nice idea, I, for one, would like to > disable it altogether or make it an absolute minimum for things like USB > flash drives - because I don't care about multithreaded performance or > delayed allocation on such devices - I'm interested in my data reaching > my USB stick ASAP - because it's how most people use them. See my other emails in this thread. There are ways to tune the amount of dirty data allowed per device. Currently the result isn't very satisfactory but we should have something usable after the next merge window. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2013-11-15 15:48 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-10-25 7:25 Disabling in-memory write cache for x86-64 in Linux II Artem S. Tashkinov 2013-10-25 8:18 ` Linus Torvalds 2013-11-05 0:50 ` Andreas Dilger 2013-11-05 4:12 ` Dave Chinner 2013-11-07 13:48 ` Jan Kara 2013-11-11 3:22 ` Dave Chinner 2013-11-11 19:31 ` Jan Kara 2013-11-05 6:32 ` Figo.zhang 2013-10-25 10:49 ` NeilBrown 2013-10-25 11:26 ` David Lang 2013-10-25 18:26 ` Artem S. Tashkinov 2013-10-25 19:40 ` Diego Calleja 2013-10-25 23:32 ` Fengguang Wu 2013-11-15 15:48 ` Diego Calleja 2013-10-25 20:43 ` NeilBrown 2013-10-25 21:03 ` Artem S. Tashkinov 2013-10-25 22:11 ` NeilBrown 2013-11-05 1:40 ` Figo.zhang 2013-11-05 1:47 ` David Lang 2013-11-05 2:08 ` NeilBrown 2013-10-29 20:49 ` Jan Kara
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).