* Terrible disk performance when files cached > 4GB
@ 2016-04-15 9:20 Colum Paget
2016-04-15 9:59 ` Michal Hocko
2016-04-15 13:56 ` Minchan Kim
0 siblings, 2 replies; 3+ messages in thread
From: Colum Paget @ 2016-04-15 9:20 UTC (permalink / raw)
To: linux-kernel
Hi all,
I suspect that many people will have reported this, but I thought I'd drop you
a line just in case everyone figures someone else has reported it. It's
possible we're just doing something wrong and so encountering this problem,
but I can't find anyone saying they've found a solution, and the problem
doesn't seem to be present in 3.x kernels, which makes us think it could be a
bug.
We are seeing a problem in 4.4.5 and 4.4.6 32-bit 'hugemem' kernels running on
machines with > 4GB ram. The problem results in disk performance dropping
from 120 MB/s to 1MB/s or even less. 3.18.x 32-bit kernels do not seem to
exhibit this behaviour, or at least we can't make it happen reliably. We've
tried 3.14.65 and 3.14.65 and they don't exhibit the same degree of problem.
We've not yet been able to test 64 bit kernels, it will be a while before we
can. We've been able to reproduce the problem on multiple machines with
different hardware configs, and with different kernel configs as regards
SMP , NUMA support and transparent hugepages.
This problem can be reproduced thusly:
Unpack/transfer a *large* number of files onto disk. As they unpack one can
monitor the amount of memory being used for file caching with 'free'. Disk
transfer speeds can be tested by 'dd'-ing a large file locally. Initially the
transfer rate for this file will be over 100GB/s. However, when the amount of
cached memory exceeds some figure (this was 4GB on some systems, 10GB on
others) disk performance will start to dramatically degrade. Very swiftly the
disks become unusable.
On some machines this situation can be recovered by:
echo 3 > /proc/sys/vm/drop_caches
However, we've seen some cases where even this doesn't seem to help, and the
machine has to be rebooted.
We believe the problem is that the memory cache gets so big that searching
through it becomes slower than reading files directly off disk. One problem
with this theory is that we're always copying the same file over and over in
our tests, so the file is unlikely to be a 'cache miss', personally I would
have expected performance to only be bad for cache misses, but it's bad for
everything, so maybe our theory is wrong.
For our purposes, we're fine running with 3.14.x series kernels, but I thought
I should let you know.
regards,
Colum
--
Colum Paget
Axiom Software Engineer
Phone: 01827 61212
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Terrible disk performance when files cached > 4GB
2016-04-15 9:20 Terrible disk performance when files cached > 4GB Colum Paget
@ 2016-04-15 9:59 ` Michal Hocko
2016-04-15 13:56 ` Minchan Kim
1 sibling, 0 replies; 3+ messages in thread
From: Michal Hocko @ 2016-04-15 9:59 UTC (permalink / raw)
To: Colum Paget; +Cc: linux-kernel, linux-mm
On Fri 15-04-16 10:20:33, Colum Paget wrote:
> Hi all,
>
> I suspect that many people will have reported this, but I thought I'd drop you
> a line just in case everyone figures someone else has reported it. It's
> possible we're just doing something wrong and so encountering this problem,
> but I can't find anyone saying they've found a solution, and the problem
> doesn't seem to be present in 3.x kernels, which makes us think it could be a
> bug.
>
> We are seeing a problem in 4.4.5 and 4.4.6 32-bit 'hugemem' kernels running on
> machines with > 4GB ram.
I would generally discourage you from using much more than 4G on 32b
system. Lowmem mem pressure is a real problem which is inherent to the
highmem kernels.
> The problem results in disk performance dropping
> from 120 MB/s to 1MB/s or even less. 3.18.x 32-bit kernels do not seem to
> exhibit this behaviour, or at least we can't make it happen reliably. We've
> tried 3.14.65 and 3.14.65 and they don't exhibit the same degree of problem.
I would expect this is due to dirty memory throttling. Highmem is not
considered dirtyable normally (see global_dirtyable_memory) and so all
the writers will get throttled earlier. Basically any change to how much
memory can be dirtied in in the lowmem will change the balance for you.
> We've not yet been able to test 64 bit kernels, it will be a while before we
> can. We've been able to reproduce the problem on multiple machines with
> different hardware configs, and with different kernel configs as regards
> SMP , NUMA support and transparent hugepages.
>
> This problem can be reproduced thusly:
Have you tried
echo 1 > /proc/sys/vm/highmem_is_dirtyable
Please note that this might help but it is a double edge sword because
it might cause pre mature OOM killers in certain loads. 32b is simply
not that great with a lot of memory.
HTH
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Terrible disk performance when files cached > 4GB
2016-04-15 9:20 Terrible disk performance when files cached > 4GB Colum Paget
2016-04-15 9:59 ` Michal Hocko
@ 2016-04-15 13:56 ` Minchan Kim
1 sibling, 0 replies; 3+ messages in thread
From: Minchan Kim @ 2016-04-15 13:56 UTC (permalink / raw)
To: Colum Paget; +Cc: linux-kernel
On Fri, Apr 15, 2016 at 10:20:33AM +0100, Colum Paget wrote:
> Hi all,
>
> I suspect that many people will have reported this, but I thought I'd drop you
> a line just in case everyone figures someone else has reported it. It's
> possible we're just doing something wrong and so encountering this problem,
> but I can't find anyone saying they've found a solution, and the problem
> doesn't seem to be present in 3.x kernels, which makes us think it could be a
> bug.
>
> We are seeing a problem in 4.4.5 and 4.4.6 32-bit 'hugemem' kernels running on
> machines with > 4GB ram. The problem results in disk performance dropping
> from 120 MB/s to 1MB/s or even less. 3.18.x 32-bit kernels do not seem to
> exhibit this behaviour, or at least we can't make it happen reliably. We've
> tried 3.14.65 and 3.14.65 and they don't exhibit the same degree of problem.
> We've not yet been able to test 64 bit kernels, it will be a while before we
> can. We've been able to reproduce the problem on multiple machines with
> different hardware configs, and with different kernel configs as regards
> SMP , NUMA support and transparent hugepages.
>
> This problem can be reproduced thusly:
>
> Unpack/transfer a *large* number of files onto disk. As they unpack one can
> monitor the amount of memory being used for file caching with 'free'. Disk
> transfer speeds can be tested by 'dd'-ing a large file locally. Initially the
> transfer rate for this file will be over 100GB/s. However, when the amount of
> cached memory exceeds some figure (this was 4GB on some systems, 10GB on
> others) disk performance will start to dramatically degrade. Very swiftly the
> disks become unusable.
>
> On some machines this situation can be recovered by:
>
> echo 3 > /proc/sys/vm/drop_caches
>
> However, we've seen some cases where even this doesn't seem to help, and the
> machine has to be rebooted.
>
> We believe the problem is that the memory cache gets so big that searching
> through it becomes slower than reading files directly off disk. One problem
> with this theory is that we're always copying the same file over and over in
> our tests, so the file is unlikely to be a 'cache miss', personally I would
> have expected performance to only be bad for cache misses, but it's bad for
> everything, so maybe our theory is wrong.
>
> For our purposes, we're fine running with 3.14.x series kernels, but I thought
> I should let you know.
>
> regards,
>
> Colum
Did you see this patch?
https://lkml.org/lkml/2016/4/3/237
It fixes a bug 6b4f7799c6a5 ("mm: vmscan: invoke slab shrinkers from shrink_zone()")
introduced and 6b4f7799c6a5 was applied to v3.19. IOW, until 3.18, it was okay.
Thanks.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2016-04-15 13:56 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-15 9:20 Terrible disk performance when files cached > 4GB Colum Paget
2016-04-15 9:59 ` Michal Hocko
2016-04-15 13:56 ` Minchan Kim
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox