public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [patch 0/7] cpuset writeback throttling
@ 2008-10-30 19:23 David Rientjes
  2008-10-30 19:23 ` [patch 1/7] cpusets: add dirty map to struct address_space David Rientjes
                   ` (8 more replies)
  0 siblings, 9 replies; 45+ messages in thread
From: David Rientjes @ 2008-10-30 19:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Nick Piggin, Peter Zijlstra, Paul Menage,
	Derek Fults, linux-kernel

Andrew,

This is the revised cpuset writeback throttling patchset posted to LKML 
on Tuesday, October 27.

The comments from Peter Zijlstra have been addressed.  His concurrent 
page cache patchset is not currently in -mm, so we can still serialize 
updating a struct address_space's dirty_nodes on its tree_lock.  When his 
patchset is merged, the patch at the end of this message can be used to 
introduce the necessary synchronization.

This patchset applies nicely to 2.6.28-rc2-mm1 with the exception of the 
first patch due to the alloc_inode() refactoring to inode_init_always() in
e9110864c440736beb484c2c74dedc307168b14e from linux-next and additions to 
include/linux/cpuset.h from 
oom-print-triggering-tasks-cpuset-and-mems-allowed.patch (oops :).

Please consider this for inclusion in the -mm tree.

A simple way of testing this change is to create a large file that exceeds 
the amount of memory allocated to a specific cpuset.  Then, mmap and 
modify the large file (such as in the following program) while running a 
latency sensitive task in a disjoint cpuset.  Notice the writeout 
throttling that doesn't interfere with the latency sensitive task.

#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#include <fcntl.h>

int main(int argc, char **argv)
{
	void *addr;
	unsigned long length;
	unsigned long i;
	int fd;

	if (argc != 3) {
		fprintf(stderr, "usage: %s <filename> <length>\n",
			argv[0]);
		exit(1);
	}

	fd = open(argv[1], O_RDWR, 0644);
	if (fd < 0) {
		fprintf(stderr, "Cannot open file %s\n", argv[1]);
		exit(1);
	}

	length = strtoul(argv[2], NULL, 0);
	if (!length) {
		fprintf(stderr, "Invalid length %s\n", argv[2]);
		exit(1);
	}

	addr = mmap(0, length, PROT_READ | PROT_WRITE, MAP_SHARED, fd,
		    0);
	if (addr == MAP_FAILED) {
		fprintf(stderr, "mmap() failed\n");
		exit(1);
	}

	for (;;) {
		for (i = 0; i < length; i++)
			(*(char *)(addr + i))++;
		msync(addr, length, MS_SYNC);
	}
	return 0;
}



The following patch can be applied once the struct address_space's 
tree_lock is removed to protect the attachment of mapping->dirty_nodes.
---
diff --git a/fs/inode.c b/fs/inode.c
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -223,6 +223,9 @@ void inode_init_once(struct inode *inode)
 	INIT_LIST_HEAD(&inode->inotify_watches);
 	mutex_init(&inode->inotify_mutex);
 #endif
+#if MAX_NUMNODES > BITS_PER_LONG
+	spin_lock_init(&inode->i_data.dirty_nodes_lock);
+#endif
 }
 
 EXPORT_SYMBOL(inode_init_once);
diff --git a/include/linux/fs.h b/include/linux/fs.h
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -554,6 +554,7 @@ struct address_space {
 	nodemask_t		dirty_nodes;	/* nodes with dirty pages */
 #else
 	nodemask_t		*dirty_nodes;	/* pointer to mask, if dirty */
+	spinlock_t		dirty_nodes_lock; /* protects the above */
 #endif
 #endif
 } __attribute__((aligned(sizeof(long))));
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -2413,25 +2413,27 @@ EXPORT_SYMBOL_GPL(cpuset_mem_spread_node);
 #if MAX_NUMNODES > BITS_PER_LONG
 /*
  * Special functions for NUMA systems with a large number of nodes.  The
- * nodemask is pointed to from the address_space structure.  The attachment of
- * the dirty_nodes nodemask is protected by the tree_lock.  The nodemask is
- * freed only when the inode is cleared (and therefore unused, thus no locking
- * is necessary).
+ * nodemask is pointed to from the address_space structure.
  */
 void cpuset_update_dirty_nodes(struct address_space *mapping,
 			       struct page *page)
 {
-	nodemask_t *nodes = mapping->dirty_nodes;
+	nodemask_t *nodes;
 	int node = page_to_nid(page);
 
+	spin_lock_irq(&mapping->dirty_nodes_lock);
+	nodes = mapping->dirty_nodes;
 	if (!nodes) {
 		nodes = kmalloc(sizeof(nodemask_t), GFP_ATOMIC);
-		if (!nodes)
+		if (!nodes) {
+			spin_unlock_irq(&mapping->dirty_nodes_lock);
 			return;
+		}
 
 		*nodes = NODE_MASK_NONE;
 		mapping->dirty_nodes = nodes;
 	}
+	spin_unlock_irq(&mapping->dirty_nodes_lock);
 	node_set(node, *nodes);
 }
 
@@ -2446,8 +2448,8 @@ void cpuset_clear_dirty_nodes(struct address_space *mapping)
 }
 
 /*
- * Called without tree_lock.  The nodemask is only freed when the inode is
- * cleared and therefore this is safe.
+ * The nodemask is only freed when the inode is cleared and therefore this
+ * requires no locking.
  */
 int cpuset_intersects_dirty_nodes(struct address_space *mapping,
 				  nodemask_t *mask)

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2008-11-10 10:03 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-30 19:23 [patch 0/7] cpuset writeback throttling David Rientjes
2008-10-30 19:23 ` [patch 1/7] cpusets: add dirty map to struct address_space David Rientjes
2008-11-04 21:09   ` Andrew Morton
2008-11-04 21:20     ` Christoph Lameter
2008-11-04 21:42       ` Andrew Morton
2008-10-30 19:23 ` [patch 2/7] pdflush: allow the passing of a nodemask parameter David Rientjes
2008-10-30 19:23 ` [patch 3/7] mm: make page writeback obey cpuset constraints David Rientjes
2008-10-30 19:23 ` [patch 5/7] mm: throttle writeout with cpuset awareness David Rientjes
2008-10-30 19:23 ` [patch 4/7] mm: cpuset aware reclaim writeout David Rientjes
2008-10-30 19:23 ` [patch 6/7] cpusets: per cpuset dirty ratios David Rientjes
2008-10-30 19:23 ` [patch 7/7] cpusets: update documentation for writeback throttling David Rientjes
2008-10-30 21:08 ` [patch 0/7] cpuset " Dave Chinner
2008-10-30 21:33   ` Christoph Lameter
2008-10-30 22:03     ` Dave Chinner
2008-10-31 13:47       ` Christoph Lameter
2008-10-31 16:36       ` David Rientjes
2008-11-04 20:47 ` Andrew Morton
2008-11-04 20:53   ` Peter Zijlstra
2008-11-04 20:58     ` Christoph Lameter
2008-11-04 21:10     ` David Rientjes
2008-11-04 21:16     ` Andrew Morton
2008-11-04 21:21       ` Peter Zijlstra
2008-11-04 21:50         ` Andrew Morton
2008-11-04 22:17           ` Christoph Lameter
2008-11-04 22:35             ` Andrew Morton
2008-11-04 22:52               ` Christoph Lameter
2008-11-04 23:36                 ` Andrew Morton
2008-11-05  1:31                   ` KAMEZAWA Hiroyuki
2008-11-05  3:09                     ` Andrew Morton
2008-11-05  2:45                   ` Christoph Lameter
2008-11-05  3:05                     ` Andrew Morton
2008-11-05  4:31                       ` KAMEZAWA Hiroyuki
2008-11-10  9:02                         ` Andrea Righi
2008-11-10 10:02                           ` David Rientjes
2008-11-05 13:52                       ` Christoph Lameter
2008-11-05 18:41                         ` Andrew Morton
2008-11-05 20:21                           ` Christoph Lameter
2008-11-05 20:31                             ` Andrew Morton
2008-11-05 20:40                               ` Christoph Lameter
2008-11-05 20:56                                 ` Andrew Morton
2008-11-05 21:28                                   ` Christoph Lameter
2008-11-05 21:55                                   ` Paul Menage
2008-11-05 22:04                                   ` David Rientjes
2008-11-06  1:34                                     ` KAMEZAWA Hiroyuki
2008-11-06 20:35                                       ` David Rientjes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox