public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Terrible disk performance when files cached > 4GB
@ 2016-04-15  9:20 Colum Paget
  2016-04-15  9:59 ` Michal Hocko
  2016-04-15 13:56 ` Minchan Kim
  0 siblings, 2 replies; 3+ messages in thread
From: Colum Paget @ 2016-04-15  9:20 UTC (permalink / raw)
  To: linux-kernel

Hi all,

I suspect that many people will have reported this, but I thought I'd drop you 
a line just in case everyone figures someone else has reported it. It's 
possible we're just doing something wrong and so encountering this problem, 
but I can't find anyone saying they've found a solution, and the problem 
doesn't seem to be present in 3.x kernels, which makes us think it could be a 
bug.

We are seeing a problem in 4.4.5 and 4.4.6 32-bit 'hugemem' kernels running on 
machines with > 4GB ram. The problem results in disk performance dropping 
from 120 MB/s to 1MB/s or even less. 3.18.x 32-bit kernels do not seem to 
exhibit this behaviour, or at least we can't make it happen reliably. We've 
tried 3.14.65 and 3.14.65 and they don't exhibit the same degree of problem. 
We've not yet been able to test 64 bit kernels, it will be a while before we 
can. We've been able to reproduce the problem on multiple machines with 
different hardware configs, and with different kernel configs as regards 
SMP , NUMA support and transparent hugepages.

This problem can be reproduced thusly:

Unpack/transfer a *large* number of files onto disk. As they unpack one can 
monitor the amount of memory being used for file caching with 'free'. Disk 
transfer speeds can be tested by 'dd'-ing a large file locally. Initially the 
transfer rate for this file will be over 100GB/s. However, when the amount of 
cached memory exceeds some figure (this was 4GB on some systems, 10GB on 
others) disk performance will start to dramatically degrade. Very swiftly the 
disks become unusable.

On some machines this situation can be recovered by:

  echo 3 > /proc/sys/vm/drop_caches

However, we've seen some cases where even this doesn't seem to help, and the 
machine has to be rebooted.

We believe the problem is that the memory cache gets so big that searching 
through it becomes slower than reading files directly off disk. One problem 
with this theory is that we're always copying the same file over and over in 
our tests, so the file is unlikely to be a 'cache miss', personally I would 
have expected performance to only be bad for cache misses, but it's bad for 
everything, so maybe our theory is wrong.

For our purposes, we're fine running with 3.14.x series kernels, but I thought 
I should let you know.

regards,

Colum

-- 
Colum Paget
Axiom Software Engineer
Phone: 01827 61212

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Terrible disk performance when files cached > 4GB
  2016-04-15  9:20 Terrible disk performance when files cached > 4GB Colum Paget
@ 2016-04-15  9:59 ` Michal Hocko
  2016-04-15 13:56 ` Minchan Kim
  1 sibling, 0 replies; 3+ messages in thread
From: Michal Hocko @ 2016-04-15  9:59 UTC (permalink / raw)
  To: Colum Paget; +Cc: linux-kernel, linux-mm

On Fri 15-04-16 10:20:33, Colum Paget wrote:
> Hi all,
> 
> I suspect that many people will have reported this, but I thought I'd drop you 
> a line just in case everyone figures someone else has reported it. It's 
> possible we're just doing something wrong and so encountering this problem, 
> but I can't find anyone saying they've found a solution, and the problem 
> doesn't seem to be present in 3.x kernels, which makes us think it could be a 
> bug.
> 
> We are seeing a problem in 4.4.5 and 4.4.6 32-bit 'hugemem' kernels running on 
> machines with > 4GB ram.

I would generally discourage you from using much more than 4G on 32b
system. Lowmem mem pressure is a real problem which is inherent to the
highmem kernels.

> The problem results in disk performance dropping 
> from 120 MB/s to 1MB/s or even less. 3.18.x 32-bit kernels do not seem to 
> exhibit this behaviour, or at least we can't make it happen reliably. We've 
> tried 3.14.65 and 3.14.65 and they don't exhibit the same degree of problem.

I would expect this is due to dirty memory throttling. Highmem is not
considered dirtyable normally (see global_dirtyable_memory) and so all
the writers will get throttled earlier. Basically any change to how much
memory can be dirtied in in the lowmem will change the balance for you.

> We've not yet been able to test 64 bit kernels, it will be a while before we 
> can. We've been able to reproduce the problem on multiple machines with 
> different hardware configs, and with different kernel configs as regards 
> SMP , NUMA support and transparent hugepages.
> 
> This problem can be reproduced thusly:

Have you tried
echo 1 > /proc/sys/vm/highmem_is_dirtyable

Please note that this might help but it is a double edge sword because
it might cause pre mature OOM killers in certain loads. 32b is simply
not that great with a lot of memory.

HTH
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Terrible disk performance when files cached > 4GB
  2016-04-15  9:20 Terrible disk performance when files cached > 4GB Colum Paget
  2016-04-15  9:59 ` Michal Hocko
@ 2016-04-15 13:56 ` Minchan Kim
  1 sibling, 0 replies; 3+ messages in thread
From: Minchan Kim @ 2016-04-15 13:56 UTC (permalink / raw)
  To: Colum Paget; +Cc: linux-kernel

On Fri, Apr 15, 2016 at 10:20:33AM +0100, Colum Paget wrote:
> Hi all,
> 
> I suspect that many people will have reported this, but I thought I'd drop you 
> a line just in case everyone figures someone else has reported it. It's 
> possible we're just doing something wrong and so encountering this problem, 
> but I can't find anyone saying they've found a solution, and the problem 
> doesn't seem to be present in 3.x kernels, which makes us think it could be a 
> bug.
> 
> We are seeing a problem in 4.4.5 and 4.4.6 32-bit 'hugemem' kernels running on 
> machines with > 4GB ram. The problem results in disk performance dropping 
> from 120 MB/s to 1MB/s or even less. 3.18.x 32-bit kernels do not seem to 
> exhibit this behaviour, or at least we can't make it happen reliably. We've 
> tried 3.14.65 and 3.14.65 and they don't exhibit the same degree of problem. 
> We've not yet been able to test 64 bit kernels, it will be a while before we 
> can. We've been able to reproduce the problem on multiple machines with 
> different hardware configs, and with different kernel configs as regards 
> SMP , NUMA support and transparent hugepages.
> 
> This problem can be reproduced thusly:
> 
> Unpack/transfer a *large* number of files onto disk. As they unpack one can 
> monitor the amount of memory being used for file caching with 'free'. Disk 
> transfer speeds can be tested by 'dd'-ing a large file locally. Initially the 
> transfer rate for this file will be over 100GB/s. However, when the amount of 
> cached memory exceeds some figure (this was 4GB on some systems, 10GB on 
> others) disk performance will start to dramatically degrade. Very swiftly the 
> disks become unusable.
> 
> On some machines this situation can be recovered by:
> 
>   echo 3 > /proc/sys/vm/drop_caches
> 
> However, we've seen some cases where even this doesn't seem to help, and the 
> machine has to be rebooted.
> 
> We believe the problem is that the memory cache gets so big that searching 
> through it becomes slower than reading files directly off disk. One problem 
> with this theory is that we're always copying the same file over and over in 
> our tests, so the file is unlikely to be a 'cache miss', personally I would 
> have expected performance to only be bad for cache misses, but it's bad for 
> everything, so maybe our theory is wrong.
> 
> For our purposes, we're fine running with 3.14.x series kernels, but I thought 
> I should let you know.
> 
> regards,
> 
> Colum

Did you see this patch?

https://lkml.org/lkml/2016/4/3/237

It fixes a bug 6b4f7799c6a5 ("mm: vmscan: invoke slab shrinkers from shrink_zone()")
introduced and 6b4f7799c6a5 was applied to v3.19. IOW, until 3.18, it was okay.

Thanks.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-04-15 13:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-15  9:20 Terrible disk performance when files cached > 4GB Colum Paget
2016-04-15  9:59 ` Michal Hocko
2016-04-15 13:56 ` Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox