linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] Prioritise inodes and zones for writeback required by page reclaim
@ 2010-08-04 14:38 Mel Gorman
  2010-08-04 14:38 ` [PATCH 1/2] writeback: Prioritise dirty inodes encountered by reclaim for background flushing Mel Gorman
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Mel Gorman @ 2010-08-04 14:38 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-mm
  Cc: Wu Fengguang, Andrew Morton, Dave Chinner, Chris Mason,
	Nick Piggin, Rik van Riel, Johannes Weiner, Christoph Hellwig,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, Mel Gorman

Commenting on the series "Reduce writeback from page reclaim context V6"
Andrew Morton noted;

  direct-reclaim wants to write a dirty page because that page is in the
  zone which the caller wants to allocate from!  Telling the flusher threads
  to perform generic writeback will sometimes cause them to just gum the
  disk up with pages from different zones, making it even harder/slower to
  allocate a page from the zones we're interested in, no?

On the machines used to test the series, there were relatively few zones
and only one BDI so the scenario describes is a possibility. This series is
a very early prototype series aimed at mitigating the problem.

Patch 1 adds wakeup_flusher_threads_pages() which takes a list of pages
from page reclaim. Each inode belonging to a page on the list is marked
I_DIRTY_RECLAIM. When the flusher thread wakes, inodes with this tag are
unconditionally moved to the wb->b_io list for writing.

Patch 2 notes that writing back inodes does not necessarily write back
pages belonging to the zone page reclaim is concerned with. In response, it
adds a zone and counter to wb_writeback_work. As pages from the target zone
are written, the zone-specific counter is updated. When the flusher thread
then checks the zone counters if a specific zone is being targeted. While
more pages may be written than necessary, the assumption is that the pages
need cleaning eventually, the inode must be relatively old to have pages at
the end of the LRU, the IO will be relatively efficient due to less random
seeks and that pages from the target zone will still be cleaned.

Testing did not show any significant differences in terms of reducing dirty
file pages being written back but the lack of multiple BDIs and NUMA nodes in
the test rig is a problem. Maybe someone else has access to a more suitable
test rig.

Any comment as to the suitability for such a direction?

 fs/fs-writeback.c         |   83 +++++++++++++++++++++++++++++++++++++++++---
 include/linux/fs.h        |    5 ++-
 include/linux/writeback.h |    5 +++
 mm/page-writeback.c       |   12 ++++++-
 mm/vmscan.c               |   11 ++++--
 5 files changed, 103 insertions(+), 13 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] writeback: Prioritise dirty inodes encountered by reclaim for background flushing
  2010-08-04 14:38 [RFC PATCH 0/2] Prioritise inodes and zones for writeback required by page reclaim Mel Gorman
@ 2010-08-04 14:38 ` Mel Gorman
  2010-08-04 14:38 ` [PATCH 2/2] writeback: Account for pages written back belonging to a particular zone Mel Gorman
  2010-08-04 22:56 ` [RFC PATCH 0/2] Prioritise inodes and zones for writeback required by page reclaim Andrew Morton
  2 siblings, 0 replies; 5+ messages in thread
From: Mel Gorman @ 2010-08-04 14:38 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-mm
  Cc: Wu Fengguang, Andrew Morton, Dave Chinner, Chris Mason,
	Nick Piggin, Rik van Riel, Johannes Weiner, Christoph Hellwig,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, Mel Gorman

It is preferable that as few dirty pages are dispatched for cleaning from
the page reclaim path. When dirty pages are encountered by page reclaim,
this patch marks the inodes that they should be dispatched immediately. When
the background flusher runs, it moves such inodes immediately to the dispatch
queue regardless of inode age.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 fs/fs-writeback.c         |   52 ++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/fs.h        |    5 ++-
 include/linux/writeback.h |    1 +
 mm/vmscan.c               |    6 +++-
 4 files changed, 59 insertions(+), 5 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index d5be169..0912f93 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -205,9 +205,17 @@ static void move_expired_inodes(struct list_head *delaying_queue,
 	LIST_HEAD(tmp);
 	struct list_head *pos, *node;
 	struct super_block *sb = NULL;
-	struct inode *inode;
+	struct inode *inode, *tinode;
 	int do_sb_sort = 0;
 
+	/* Move inodes reclaim found at end of LRU to dispatch queue */
+	list_for_each_entry_safe(inode, tinode, delaying_queue, i_list) {
+		if (inode->i_state & I_DIRTY_RECLAIM) {
+			inode->i_state &= ~I_DIRTY_RECLAIM;
+			list_move(&inode->i_list, &tmp);
+		}
+	}
+
 	while (!list_empty(delaying_queue)) {
 		inode = list_entry(delaying_queue->prev, struct inode, i_list);
 		if (older_than_this &&
@@ -838,6 +846,48 @@ void wakeup_flusher_threads(long nr_pages)
 	rcu_read_unlock();
 }
 
+/*
+ * Similar to wakeup_flusher_threads except prioritise inodes contained
+ * in the page_list regardless of age
+ */
+void wakeup_flusher_threads_pages(long nr_pages, struct list_head *page_list)
+{
+	struct page *page;
+	struct address_space *mapping;
+	struct inode *inode;
+
+	list_for_each_entry(page, page_list, lru) {
+		if (!PageDirty(page))
+			continue;
+
+		lock_page(page);
+		mapping = page_mapping(page);
+		if (!mapping || mapping == &swapper_space)
+			goto unlock;
+
+		/*
+		 * Test outside the lock to see as if it is already set, taking
+		 * the inode lock is a waste and the inode should be pinned by
+		 * the lock_page
+		 */
+		inode = page->mapping->host;
+		if (inode->i_state & I_DIRTY_RECLAIM)
+			goto unlock;
+
+		/*
+		 * XXX: Yuck, has to be a way of batching this by not requiring
+		 *	the page lock to pin the inode
+		 */
+		spin_lock(&inode_lock);
+		inode->i_state |= I_DIRTY_RECLAIM;
+		spin_unlock(&inode_lock);
+unlock:
+		unlock_page(page);
+	}
+
+	wakeup_flusher_threads(nr_pages);
+}
+
 static noinline void block_dump___mark_inode_dirty(struct inode *inode)
 {
 	if (inode->i_ino || strcmp(inode->i_sb->s_id, "bdev")) {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 68ca1b0..19ad1f5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1585,8 +1585,8 @@ struct super_operations {
 /*
  * Inode state bits.  Protected by inode_lock.
  *
- * Three bits determine the dirty state of the inode, I_DIRTY_SYNC,
- * I_DIRTY_DATASYNC and I_DIRTY_PAGES.
+ * Four bits determine the dirty state of the inode, I_DIRTY_SYNC,
+ * I_DIRTY_DATASYNC, I_DIRTY_PAGES and I_DIRTY_RECLAIM.
  *
  * Four bits define the lifetime of an inode.  Initially, inodes are I_NEW,
  * until that flag is cleared.  I_WILL_FREE, I_FREEING and I_CLEAR are set at
@@ -1633,6 +1633,7 @@ struct super_operations {
 #define I_DIRTY_SYNC		1
 #define I_DIRTY_DATASYNC	2
 #define I_DIRTY_PAGES		4
+#define I_DIRTY_RECLAIM		256
 #define __I_NEW			3
 #define I_NEW			(1 << __I_NEW)
 #define I_WILL_FREE		16
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index c24eca7..7d4eee4 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -66,6 +66,7 @@ void writeback_inodes_wb(struct bdi_writeback *wb,
 		struct writeback_control *wbc);
 long wb_do_writeback(struct bdi_writeback *wb, int force_wait);
 void wakeup_flusher_threads(long nr_pages);
+void wakeup_flusher_threads_pages(long nr_pages, struct list_head *page_list);
 
 /* writeback.h requires fs.h; it, too, is not included from here. */
 static inline void wait_on_inode(struct inode *inode)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c4c81bc..c997d80 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -901,7 +901,8 @@ keep:
 	 * laptop mode avoiding disk spin-ups
 	 */
 	if (file && nr_dirty_seen && sc->may_writepage)
-		wakeup_flusher_threads(nr_writeback_pages(nr_dirty));
+		wakeup_flusher_threads_pages(nr_writeback_pages(nr_dirty),
+					page_list);
 
 	*nr_still_dirty = nr_dirty;
 	count_vm_events(PGACTIVATE, pgactivate);
@@ -1368,7 +1369,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 				list_add(&page->lru, &putback_list);
 			}
 
-			wakeup_flusher_threads(laptop_mode ? 0 : nr_dirty);
+			wakeup_flusher_threads_pages(laptop_mode ? 0 : nr_dirty,
+								&page_list);
 			congestion_wait(BLK_RW_ASYNC, HZ/10);
 
 			/*
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] writeback: Account for pages written back belonging to a particular zone
  2010-08-04 14:38 [RFC PATCH 0/2] Prioritise inodes and zones for writeback required by page reclaim Mel Gorman
  2010-08-04 14:38 ` [PATCH 1/2] writeback: Prioritise dirty inodes encountered by reclaim for background flushing Mel Gorman
@ 2010-08-04 14:38 ` Mel Gorman
  2010-08-04 22:56 ` [RFC PATCH 0/2] Prioritise inodes and zones for writeback required by page reclaim Andrew Morton
  2 siblings, 0 replies; 5+ messages in thread
From: Mel Gorman @ 2010-08-04 14:38 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-mm
  Cc: Wu Fengguang, Andrew Morton, Dave Chinner, Chris Mason,
	Nick Piggin, Rik van Riel, Johannes Weiner, Christoph Hellwig,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, Mel Gorman

When reclaim encounters dirty file pages on the LRU lists, it wakes up
flusher threads to clean pages belonging to old inodes. In the event the
inode has dirty pages on multiple zones, the flusher threads may exit before
pages within the zone of interest are clean. This patch accounts for the
zone page reclaim is interested in.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 fs/fs-writeback.c         |   35 ++++++++++++++++++++++++++++-------
 include/linux/writeback.h |    6 +++++-
 mm/page-writeback.c       |   12 +++++++++++-
 mm/vmscan.c               |    9 +++++----
 4 files changed, 49 insertions(+), 13 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 0912f93..cc52322 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -40,8 +40,10 @@ int nr_pdflush_threads;
  */
 struct wb_writeback_work {
 	long nr_pages;
+	long nr_zone_pages;
 	struct super_block *sb;
 	enum writeback_sync_modes sync_mode;
+	struct zone *zone;
 	unsigned int for_kupdate:1;
 	unsigned int range_cyclic:1;
 	unsigned int for_background:1;
@@ -85,6 +87,7 @@ static void bdi_queue_work(struct backing_dev_info *bdi,
 
 static void
 __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
+		struct zone *zone,
 		bool range_cyclic, bool for_background)
 {
 	struct wb_writeback_work *work;
@@ -104,6 +107,10 @@ __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
 	work->nr_pages	= nr_pages;
 	work->range_cyclic = range_cyclic;
 	work->for_background = for_background;
+	if (zone) {
+		work->zone = zone;
+		work->nr_zone_pages = nr_pages;
+	}
 
 	bdi_queue_work(bdi, work);
 }
@@ -121,7 +128,7 @@ __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
  */
 void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages)
 {
-	__bdi_start_writeback(bdi, nr_pages, true, false);
+	__bdi_start_writeback(bdi, nr_pages, NULL, true, false);
 }
 
 /**
@@ -135,7 +142,7 @@ void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages)
  */
 void bdi_start_background_writeback(struct backing_dev_info *bdi)
 {
-	__bdi_start_writeback(bdi, LONG_MAX, true, true);
+	__bdi_start_writeback(bdi, LONG_MAX, NULL, true, true);
 }
 
 /*
@@ -650,17 +657,25 @@ static long wb_writeback(struct bdi_writeback *wb,
 		wbc.more_io = 0;
 		wbc.nr_to_write = MAX_WRITEBACK_PAGES;
 		wbc.pages_skipped = 0;
+		if (work->zone) {
+			wbc.zone = work->zone;
+			wbc.nr_zone_to_write = wbc.nr_to_write;
+		}
 		if (work->sb)
 			__writeback_inodes_sb(work->sb, wb, &wbc);
 		else
 			writeback_inodes_wb(wb, &wbc);
 		work->nr_pages -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
+		if (work->zone)
+			work->nr_zone_pages -=
+				MAX_WRITEBACK_PAGES - wbc.nr_zone_to_write;
 		wrote += MAX_WRITEBACK_PAGES - wbc.nr_to_write;
 
 		/*
 		 * If we consumed everything, see if we have more
 		 */
-		if (wbc.nr_to_write <= 0)
+		if ((wbc.zone && wbc.nr_zone_to_write <= 0) ||
+							wbc.nr_to_write <= 0)
 			continue;
 		/*
 		 * Didn't write everything and we don't have more IO, bail
@@ -828,7 +843,7 @@ int bdi_writeback_task(struct bdi_writeback *wb)
  * Start writeback of `nr_pages' pages.  If `nr_pages' is zero, write back
  * the whole world.
  */
-void wakeup_flusher_threads(long nr_pages)
+static void wakeup_flusher_threads_zone(long nr_pages, struct zone *zone)
 {
 	struct backing_dev_info *bdi;
 
@@ -841,7 +856,7 @@ void wakeup_flusher_threads(long nr_pages)
 	list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) {
 		if (!bdi_has_dirty_io(bdi))
 			continue;
-		__bdi_start_writeback(bdi, nr_pages, false, false);
+		__bdi_start_writeback(bdi, nr_pages, zone, false, false);
 	}
 	rcu_read_unlock();
 }
@@ -850,7 +865,8 @@ void wakeup_flusher_threads(long nr_pages)
  * Similar to wakeup_flusher_threads except prioritise inodes contained
  * in the page_list regardless of age
  */
-void wakeup_flusher_threads_pages(long nr_pages, struct list_head *page_list)
+void wakeup_flusher_threads_pages(long nr_pages, struct zone *zone,
+						struct list_head *page_list)
 {
 	struct page *page;
 	struct address_space *mapping;
@@ -885,7 +901,12 @@ unlock:
 		unlock_page(page);
 	}
 
-	wakeup_flusher_threads(nr_pages);
+	wakeup_flusher_threads_zone(nr_pages, zone);
+}
+
+void wakeup_flusher_threads(long nr_pages)
+{
+	wakeup_flusher_threads_zone(nr_pages, NULL);
 }
 
 static noinline void block_dump___mark_inode_dirty(struct inode *inode)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 7d4eee4..91e9b89 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -35,7 +35,10 @@ struct writeback_control {
 					   extra jobs and livelock */
 	long nr_to_write;		/* Write this many pages, and decrement
 					   this for each page written */
+	long nr_zone_to_write;		/* Write this many pages from a
+					   specific zone */
 	long pages_skipped;		/* Pages which were not written */
+	struct zone *zone;		/* Zone that needs clean pages */
 
 	/*
 	 * For a_ops->writepages(): is start or end are non-zero then this is
@@ -66,7 +69,8 @@ void writeback_inodes_wb(struct bdi_writeback *wb,
 		struct writeback_control *wbc);
 long wb_do_writeback(struct bdi_writeback *wb, int force_wait);
 void wakeup_flusher_threads(long nr_pages);
-void wakeup_flusher_threads_pages(long nr_pages, struct list_head *page_list);
+void wakeup_flusher_threads_pages(long nr_pages, struct zone *zone,
+						struct list_head *page_list);
 
 /* writeback.h requires fs.h; it, too, is not included from here. */
 static inline void wait_on_inode(struct inode *inode)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 37498ef..0bdddac 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -950,7 +950,17 @@ continue_unlock:
 			}
 
 			if (wbc->nr_to_write > 0) {
-				if (--wbc->nr_to_write == 0 &&
+				long nr_to_write;
+				wbc->nr_to_write--;
+
+				if (wbc->zone) {
+					if (wbc->zone == page_zone(page))
+						wbc->nr_zone_to_write--;
+					nr_to_write = wbc->nr_zone_to_write;
+				} else
+					nr_to_write = wbc->nr_to_write;
+
+				if (nr_to_write == 0 &&
 				    wbc->sync_mode == WB_SYNC_NONE) {
 					/*
 					 * We stop writing back only if we are
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c997d80..036e9fd 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -660,6 +660,7 @@ static noinline_for_stack void free_page_list(struct list_head *free_pages)
  */
 static unsigned long shrink_page_list(struct list_head *page_list,
 					struct scan_control *sc,
+					struct zone *zone,
 					enum pageout_io sync_writeback,
 					int file,
 					unsigned long *nr_still_dirty)
@@ -902,7 +903,7 @@ keep:
 	 */
 	if (file && nr_dirty_seen && sc->may_writepage)
 		wakeup_flusher_threads_pages(nr_writeback_pages(nr_dirty),
-					page_list);
+					zone, page_list);
 
 	*nr_still_dirty = nr_dirty;
 	count_vm_events(PGACTIVATE, pgactivate);
@@ -1343,7 +1344,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 
 	spin_unlock_irq(&zone->lru_lock);
 
-	nr_reclaimed = shrink_page_list(&page_list, sc, PAGEOUT_IO_ASYNC,
+	nr_reclaimed = shrink_page_list(&page_list, sc, zone, PAGEOUT_IO_ASYNC,
 							file, &nr_dirty);
 
 	/*
@@ -1370,7 +1371,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 			}
 
 			wakeup_flusher_threads_pages(laptop_mode ? 0 : nr_dirty,
-								&page_list);
+							zone, &page_list);
 			congestion_wait(BLK_RW_ASYNC, HZ/10);
 
 			/*
@@ -1380,7 +1381,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 			nr_active = clear_active_flags(&page_list, NULL);
 			count_vm_events(PGDEACTIVATE, nr_active);
 
-			nr_reclaimed += shrink_page_list(&page_list, sc,
+			nr_reclaimed += shrink_page_list(&page_list, sc, zone,
 						PAGEOUT_IO_SYNC, file,
 						&nr_dirty);
 		}
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 0/2] Prioritise inodes and zones for writeback required by page reclaim
  2010-08-04 14:38 [RFC PATCH 0/2] Prioritise inodes and zones for writeback required by page reclaim Mel Gorman
  2010-08-04 14:38 ` [PATCH 1/2] writeback: Prioritise dirty inodes encountered by reclaim for background flushing Mel Gorman
  2010-08-04 14:38 ` [PATCH 2/2] writeback: Account for pages written back belonging to a particular zone Mel Gorman
@ 2010-08-04 22:56 ` Andrew Morton
  2010-08-05 13:42   ` Mel Gorman
  2 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2010-08-04 22:56 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-kernel, linux-fsdevel, linux-mm, Wu Fengguang, Dave Chinner,
	Chris Mason, Nick Piggin, Rik van Riel, Johannes Weiner,
	Christoph Hellwig, KAMEZAWA Hiroyuki, KOSAKI Motohiro

On Wed,  4 Aug 2010 15:38:29 +0100
Mel Gorman <mel@csn.ul.ie> wrote:

> Commenting on the series "Reduce writeback from page reclaim context V6"
> Andrew Morton noted;
> 
>   direct-reclaim wants to write a dirty page because that page is in the
>   zone which the caller wants to allocate from!  Telling the flusher threads
>   to perform generic writeback will sometimes cause them to just gum the
>   disk up with pages from different zones, making it even harder/slower to
>   allocate a page from the zones we're interested in, no?
> 
> On the machines used to test the series, there were relatively few zones
> and only one BDI so the scenario describes is a possibility. This series is
> a very early prototype series aimed at mitigating the problem.
> 
> Patch 1 adds wakeup_flusher_threads_pages() which takes a list of pages
> from page reclaim. Each inode belonging to a page on the list is marked
> I_DIRTY_RECLAIM. When the flusher thread wakes, inodes with this tag are
> unconditionally moved to the wb->b_io list for writing.
> 
> Patch 2 notes that writing back inodes does not necessarily write back
> pages belonging to the zone page reclaim is concerned with. In response, it
> adds a zone and counter to wb_writeback_work. As pages from the target zone
> are written, the zone-specific counter is updated. When the flusher thread
> then checks the zone counters if a specific zone is being targeted. While
> more pages may be written than necessary, the assumption is that the pages
> need cleaning eventually, the inode must be relatively old to have pages at
> the end of the LRU, the IO will be relatively efficient due to less random
> seeks and that pages from the target zone will still be cleaned.
> 
> Testing did not show any significant differences in terms of reducing dirty
> file pages being written back but the lack of multiple BDIs and NUMA nodes in
> the test rig is a problem. Maybe someone else has access to a more suitable
> test rig.
> 
> Any comment as to the suitability for such a direction?

um.  Might work.  Isn't pretty though.

But until we can demonstrate the problem or someone reports it, we
probably have more important issues to be looking at ;) I think that a
better approach is to try to trigger this problem as we develop and
test reclaim.  And if we _can't_ demonstrate it, work out why the heck
not - either the code's smarter than we thought it was or the test is
no good.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 0/2] Prioritise inodes and zones for writeback required by page reclaim
  2010-08-04 22:56 ` [RFC PATCH 0/2] Prioritise inodes and zones for writeback required by page reclaim Andrew Morton
@ 2010-08-05 13:42   ` Mel Gorman
  0 siblings, 0 replies; 5+ messages in thread
From: Mel Gorman @ 2010-08-05 13:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-fsdevel, linux-mm, Wu Fengguang, Dave Chinner,
	Chris Mason, Nick Piggin, Rik van Riel, Johannes Weiner,
	Christoph Hellwig, KAMEZAWA Hiroyuki, KOSAKI Motohiro

On Wed, Aug 04, 2010 at 03:56:10PM -0700, Andrew Morton wrote:
> On Wed,  4 Aug 2010 15:38:29 +0100
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > Commenting on the series "Reduce writeback from page reclaim context V6"
> > Andrew Morton noted;
> > 
> >   direct-reclaim wants to write a dirty page because that page is in the
> >   zone which the caller wants to allocate from!  Telling the flusher threads
> >   to perform generic writeback will sometimes cause them to just gum the
> >   disk up with pages from different zones, making it even harder/slower to
> >   allocate a page from the zones we're interested in, no?
> > 
> > On the machines used to test the series, there were relatively few zones
> > and only one BDI so the scenario describes is a possibility. This series is
> > a very early prototype series aimed at mitigating the problem.
> > 
> > Patch 1 adds wakeup_flusher_threads_pages() which takes a list of pages
> > from page reclaim. Each inode belonging to a page on the list is marked
> > I_DIRTY_RECLAIM. When the flusher thread wakes, inodes with this tag are
> > unconditionally moved to the wb->b_io list for writing.
> > 
> > Patch 2 notes that writing back inodes does not necessarily write back
> > pages belonging to the zone page reclaim is concerned with. In response, it
> > adds a zone and counter to wb_writeback_work. As pages from the target zone
> > are written, the zone-specific counter is updated. When the flusher thread
> > then checks the zone counters if a specific zone is being targeted. While
> > more pages may be written than necessary, the assumption is that the pages
> > need cleaning eventually, the inode must be relatively old to have pages at
> > the end of the LRU, the IO will be relatively efficient due to less random
> > seeks and that pages from the target zone will still be cleaned.
> > 
> > Testing did not show any significant differences in terms of reducing dirty
> > file pages being written back but the lack of multiple BDIs and NUMA nodes in
> > the test rig is a problem. Maybe someone else has access to a more suitable
> > test rig.
> > 
> > Any comment as to the suitability for such a direction?
> 
> um.  Might work.  Isn't pretty though.
> 

No, it's not.

> But until we can demonstrate the problem or someone reports it, we
> probably have more important issues to be looking at ;) I think that a
> better approach is to try to trigger this problem as we develop and
> test reclaim. 

That's a reasonable plan as we'll know for sure if this is the right direction
or not. I'll put the patches on the back-burner for now and hopefully someone
will remember them if a bug is reported about large stalls under memory
pressure but that is specific to a machine with many nodes and many disks.

> And if we _can't_ demonstrate it, work out why the heck
> not - either the code's smarter than we thought it was or the test is
> no good.
> 

It's always possible that we won't be able to demonstrate it because the
right file pages are getting cleaned more often than not by the time
reclaim happens :/

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-08-05 13:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-04 14:38 [RFC PATCH 0/2] Prioritise inodes and zones for writeback required by page reclaim Mel Gorman
2010-08-04 14:38 ` [PATCH 1/2] writeback: Prioritise dirty inodes encountered by reclaim for background flushing Mel Gorman
2010-08-04 14:38 ` [PATCH 2/2] writeback: Account for pages written back belonging to a particular zone Mel Gorman
2010-08-04 22:56 ` [RFC PATCH 0/2] Prioritise inodes and zones for writeback required by page reclaim Andrew Morton
2010-08-05 13:42   ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).