From mboxrd@z Thu Jan 1 00:00:00 1970 From: Boaz Harrosh Subject: Re: [LSF/VM TOPIC] Dynamic sizing of dirty_limit Date: Mon, 08 Mar 2010 09:33:22 +0200 Message-ID: <4B94A842.9010902@panasas.com> References: <20100224143442.GF3687@quack.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: Jan Kara , lsf10-pc@lists.linuxfoundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org To: Christoph Lameter Return-path: Received: from daytona.panasas.com ([67.152.220.89]:50485 "EHLO daytona.int.panasas.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751740Ab0CHHd2 (ORCPT ); Mon, 8 Mar 2010 02:33:28 -0500 In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 02/24/2010 06:10 PM, Christoph Lameter wrote: > On Wed, 24 Feb 2010, Jan Kara wrote: > >> fine (and you probably don't want much more because the memory is better >> used for something else), when a machine does random rewrites, going to 40% >> might be well worth it. So I'd like to discuss how we could measure that >> increasing amount of dirtiable memory helps so that we could implement >> dynamic sizing of it. > > Another issue around dirty limits is that they are global. If you are > running multiple jobs on the same box (memcg or cpusets or you set > affinities to separate the box) then every job may need different dirty > limits. One idea that I had in the past was to set dirty limits based on > nodes or cpusets. But that will not cover the other cases that I have > listed above. > > The best solution would be an algorithm that can accomodate multiple loads > and manage the amount of dirty memory automatically. > One more point to consider if changes are made (and should) in this area is: The stacking filesystems problem. There are many examples, here is just a simple one. A local iscsi-target backed by a file an a filesystem, is logged into from local host, the created block device is mounted by a filesystem. Such a setup used to dead-lock before and has very poor dribbling performance today. This is because the upper-layer filesystem consumes all cache quota and leaves no available cache headroom for the lower-level FS, causing the lower-level FS a page by page write-out (at best). For example mounting such a scenario through a UML or VM will solve this problem and will preform optimally. (The iscsi-initiator + upper FS is inside the UML). There are endless examples of stacking filesystem examples, including NFS local mounts, clustered setup with local access to one of the devices, and so on. All these preform badly. A per-FS cache limit, (proportional to performance, cache is optimally measured by a time constant), should easily solve this problem as well. Boaz From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail144.messagelabs.com (mail144.messagelabs.com [216.82.254.51]) by kanga.kvack.org (Postfix) with SMTP id 7209A6B0083 for ; Mon, 8 Mar 2010 02:33:27 -0500 (EST) Message-ID: <4B94A842.9010902@panasas.com> Date: Mon, 08 Mar 2010 09:33:22 +0200 From: Boaz Harrosh MIME-Version: 1.0 Subject: Re: [LSF/VM TOPIC] Dynamic sizing of dirty_limit References: <20100224143442.GF3687@quack.suse.cz> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Christoph Lameter Cc: Jan Kara , lsf10-pc@lists.linuxfoundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org List-ID: On 02/24/2010 06:10 PM, Christoph Lameter wrote: > On Wed, 24 Feb 2010, Jan Kara wrote: > >> fine (and you probably don't want much more because the memory is better >> used for something else), when a machine does random rewrites, going to 40% >> might be well worth it. So I'd like to discuss how we could measure that >> increasing amount of dirtiable memory helps so that we could implement >> dynamic sizing of it. > > Another issue around dirty limits is that they are global. If you are > running multiple jobs on the same box (memcg or cpusets or you set > affinities to separate the box) then every job may need different dirty > limits. One idea that I had in the past was to set dirty limits based on > nodes or cpusets. But that will not cover the other cases that I have > listed above. > > The best solution would be an algorithm that can accomodate multiple loads > and manage the amount of dirty memory automatically. > One more point to consider if changes are made (and should) in this area is: The stacking filesystems problem. There are many examples, here is just a simple one. A local iscsi-target backed by a file an a filesystem, is logged into from local host, the created block device is mounted by a filesystem. Such a setup used to dead-lock before and has very poor dribbling performance today. This is because the upper-layer filesystem consumes all cache quota and leaves no available cache headroom for the lower-level FS, causing the lower-level FS a page by page write-out (at best). For example mounting such a scenario through a UML or VM will solve this problem and will preform optimally. (The iscsi-initiator + upper FS is inside the UML). There are endless examples of stacking filesystem examples, including NFS local mounts, clustered setup with local access to one of the devices, and so on. All these preform badly. A per-FS cache limit, (proportional to performance, cache is optimally measured by a time constant), should easily solve this problem as well. Boaz -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org