linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 2/4] mm: account_page_writeback added
  2010-08-20  9:31 [PATCH 0/4] writeback: kernel visibility Michael Rubin
@ 2010-08-20  9:31 ` Michael Rubin
  2010-08-20  9:45   ` Wu Fengguang
  0 siblings, 1 reply; 19+ messages in thread
From: Michael Rubin @ 2010-08-20  9:31 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-mm
  Cc: fengguang.wu, jack, riel, akpm, david, npiggin, hch, axboe,
	Michael Rubin

This allows code outside of the mm core to safely manipulate page
writeback state and not worry about the other accounting. Not using
these routines means that some code will lose track of the accounting
and we get bugs.

Signed-off-by: Michael Rubin <mrubin@google.com>
---
 fs/nilfs2/segment.c |    2 +-
 include/linux/mm.h  |    1 +
 mm/page-writeback.c |   13 ++++++++++++-
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 9fd051a..5617f16 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -1599,7 +1599,7 @@ nilfs_copy_replace_page_buffers(struct page *page, struct list_head *out)
 	kunmap_atomic(kaddr, KM_USER0);
 
 	if (!TestSetPageWriteback(clone_page))
-		inc_zone_page_state(clone_page, NR_WRITEBACK);
+		account_page_writeback(clone_page);
 	unlock_page(clone_page);
 
 	return 0;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 709f672..4b2f38b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -856,6 +856,7 @@ int __set_page_dirty_no_writeback(struct page *page);
 int redirty_page_for_writepage(struct writeback_control *wbc,
 				struct page *page);
 void account_page_dirtied(struct page *page, struct address_space *mapping);
+void account_page_writeback(struct page *page);
 int set_page_dirty(struct page *page);
 int set_page_dirty_lock(struct page *page);
 int clear_page_dirty_for_io(struct page *page);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 9d07a8d..ae5f5d5 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1134,6 +1134,17 @@ void account_page_dirtied(struct page *page, struct address_space *mapping)
 EXPORT_SYMBOL(account_page_dirtied);
 
 /*
+ * Helper function for set_page_writeback family.
+ * NOTE: Unlike account_page_dirtied this does not rely on being atomic
+ * wrt interrupts.
+ */
+void account_page_writeback(struct page *page)
+{
+	inc_zone_page_state(page, NR_WRITEBACK);
+}
+EXPORT_SYMBOL(account_page_writeback);
+
+/*
  * For address_spaces which do not use buffers.  Just tag the page as dirty in
  * its radix tree.
  *
@@ -1371,7 +1382,7 @@ int test_set_page_writeback(struct page *page)
 		ret = TestSetPageWriteback(page);
 	}
 	if (!ret)
-		inc_zone_page_state(page, NR_WRITEBACK);
+		account_page_writeback(page);
 	return ret;
 
 }
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] mm: account_page_writeback added
  2010-08-20  9:31 ` [PATCH 2/4] mm: account_page_writeback added Michael Rubin
@ 2010-08-20  9:45   ` Wu Fengguang
  2010-08-20 10:08     ` KOSAKI Motohiro
  0 siblings, 1 reply; 19+ messages in thread
From: Wu Fengguang @ 2010-08-20  9:45 UTC (permalink / raw)
  To: Michael Rubin
  Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, jack@suse.cz, riel@redhat.com,
	akpm@linux-foundation.org, david@fromorbit.com, npiggin@kernel.dk,
	hch@lst.de, axboe@kernel.dk

I'm not sure this should be an inline function, just a reminder.
Even with one more inc_zone_page_state() in next patch.

> +void account_page_writeback(struct page *page)
> +{
> +	inc_zone_page_state(page, NR_WRITEBACK);
> +}
> +EXPORT_SYMBOL(account_page_writeback);


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] mm: account_page_writeback added
  2010-08-20  9:45   ` Wu Fengguang
@ 2010-08-20 10:08     ` KOSAKI Motohiro
  0 siblings, 0 replies; 19+ messages in thread
From: KOSAKI Motohiro @ 2010-08-20 10:08 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: kosaki.motohiro, Michael Rubin, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, jack@suse.cz,
	riel@redhat.com, akpm@linux-foundation.org, david@fromorbit.com,
	npiggin@kernel.dk, hch@lst.de, axboe@kernel.dk

> I'm not sure this should be an inline function, just a reminder.
> Even with one more inc_zone_page_state() in next patch.
> 
> > +void account_page_writeback(struct page *page)
> > +{
> > +	inc_zone_page_state(page, NR_WRITEBACK);
> > +}
> > +EXPORT_SYMBOL(account_page_writeback);

Personally, I like inline. but it's no big matter.
	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 0/4] writeback: kernel visibility
@ 2010-08-28  2:40 Michael Rubin
  2010-08-28  2:40 ` [PATCH 1/4] mm: exporting account_page_dirty Michael Rubin
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Michael Rubin @ 2010-08-28  2:40 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-mm
  Cc: fengguang.wu, jack, riel, akpm, david, kosaki.motohiro, npiggin,
	hch, axboe, Michael Rubin

Patch #1 sets up some helper functions for account_page_dirty

Patch #2 sets up some helper functions for account_page_writeback

Patch #3 adds writeback visibility in /proc/vmstat

To help developers and applications gain visibility into writeback
behaviour this patch adds two counters to /proc/vmstat.

   # grep nr_dirtied /proc/vmstat
   nr_dirtied 3747
   # grep nr_cleaned /proc/vmstat
   nr_cleaned 3618

These entries allow user apps to understand writeback behaviour over
time and learn how it is impacting their performance. Currently there
is no way to inspect dirty and writeback speed over time. It's not
possible for nr_dirty/nr_writeback.

These entries are necessary to give visibility into writeback
behaviour. We have /proc/diskstats which lets us understand the io in
the block layer. We have blktrace for more in depth understanding. We have
e2fsprogs and debugsfs to give insight into the file systems behaviour,
but we don't offer our users the ability understand what writeback is
doing. There is no way to know how active it is over the whole system,
if it's falling behind or to quantify it's efforts. With these values
exported users can easily see how much data applications are sending
through writeback and also at what rates writeback is processing this
data. Comparing the rates of change between the two allow developers
to see when writeback is not able to keep up with incoming traffic and
the rate of dirty memory being sent to the IO back end. This allows
folks to understand their io workloads and track kernel issues. Non
kernel engineers at Google often use these counters to solve puzzling
performance problems.

Patch #4 add writeback thresholds to /proc/vmstat

 # grep threshold /proc/vmstat
 nr_pages_dirty_threshold 409111
 nr_pages_dirty_background_threshold 818223

The files that report the dirty thresholds belong in /proc/vmstat. They
are meant for application writers so should not be in debugfs. But since
they are more related to internals of writeback, albeit internals that
are fundamental to how it works, /proc/sys/vm is not appropriate.

These values are reported in debugfs already in
/debug/bdi/default/stats. Since debugfs is intended for kernel developers
and /proc for applications there is an argument to put it in /proc. Not
sure if that's enough but thought it worth attaching.

Michael Rubin (4):
  mm: exporting account_page_dirty
  mm: account_page_writeback added
  writeback: nr_dirtied and nr_cleaned in /proc/vmstat
  writeback: Reporting dirty thresholds in /proc/vmstat

 drivers/base/node.c    |   14 ++++++++++++++
 fs/ceph/addr.c         |    8 +-------
 fs/nilfs2/segment.c    |    2 +-
 include/linux/mm.h     |    1 +
 include/linux/mmzone.h |    4 ++++
 mm/page-writeback.c    |   16 +++++++++++++++-
 mm/vmstat.c            |    8 ++++++++
 7 files changed, 44 insertions(+), 9 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/4] mm: exporting account_page_dirty
  2010-08-28  2:40 [PATCH 0/4] writeback: kernel visibility Michael Rubin
@ 2010-08-28  2:40 ` Michael Rubin
  2010-08-28 22:11   ` Sage Weil
  2010-08-28  2:40 ` [PATCH 2/4] mm: account_page_writeback added Michael Rubin
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Michael Rubin @ 2010-08-28  2:40 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-mm
  Cc: fengguang.wu, jack, riel, akpm, david, kosaki.motohiro, npiggin,
	hch, axboe, Michael Rubin

This allows code outside of the mm core to safely manipulate page state
and not worry about the other accounting. Not using these routines means
that some code will lose track of the accounting and we get bugs. This
has happened once already.

Modified cephs to use the interface.

Signed-off-by: Michael Rubin <mrubin@google.com>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>

---
 fs/ceph/addr.c      |    8 +-------
 mm/page-writeback.c |    1 +
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 5598a0d..420d469 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -105,13 +105,7 @@ static int ceph_set_page_dirty(struct page *page)
 	spin_lock_irq(&mapping->tree_lock);
 	if (page->mapping) {	/* Race with truncate? */
 		WARN_ON_ONCE(!PageUptodate(page));
-
-		if (mapping_cap_account_dirty(mapping)) {
-			__inc_zone_page_state(page, NR_FILE_DIRTY);
-			__inc_bdi_stat(mapping->backing_dev_info,
-					BDI_RECLAIMABLE);
-			task_io_account_write(PAGE_CACHE_SIZE);
-		}
+		account_page_dirtied(page, page->mapping);
 		radix_tree_tag_set(&mapping->page_tree,
 				page_index(page), PAGECACHE_TAG_DIRTY);
 
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 7262aac..9d07a8d 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1131,6 +1131,7 @@ void account_page_dirtied(struct page *page, struct address_space *mapping)
 		task_io_account_write(PAGE_CACHE_SIZE);
 	}
 }
+EXPORT_SYMBOL(account_page_dirtied);
 
 /*
  * For address_spaces which do not use buffers.  Just tag the page as dirty in
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/4] mm: account_page_writeback added
  2010-08-28  2:40 [PATCH 0/4] writeback: kernel visibility Michael Rubin
  2010-08-28  2:40 ` [PATCH 1/4] mm: exporting account_page_dirty Michael Rubin
@ 2010-08-28  2:40 ` Michael Rubin
  2010-08-28  2:40 ` [PATCH 3/4] writeback: nr_dirtied and nr_cleaned in /proc/vmstat Michael Rubin
  2010-08-28  2:40 ` [PATCH 4/4] writeback: Reporting dirty thresholds " Michael Rubin
  3 siblings, 0 replies; 19+ messages in thread
From: Michael Rubin @ 2010-08-28  2:40 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-mm
  Cc: fengguang.wu, jack, riel, akpm, david, kosaki.motohiro, npiggin,
	hch, axboe, Michael Rubin

This allows code outside of the mm core to safely manipulate page
writeback state and not worry about the other accounting. Not using
these routines means that some code will lose track of the accounting
and we get bugs.

Modified nilfs2 to use interface.

Signed-off-by: Michael Rubin <mrubin@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 fs/nilfs2/segment.c |    2 +-
 include/linux/mm.h  |    1 +
 mm/page-writeback.c |   13 ++++++++++++-
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 9fd051a..5617f16 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -1599,7 +1599,7 @@ nilfs_copy_replace_page_buffers(struct page *page, struct list_head *out)
 	kunmap_atomic(kaddr, KM_USER0);
 
 	if (!TestSetPageWriteback(clone_page))
-		inc_zone_page_state(clone_page, NR_WRITEBACK);
+		account_page_writeback(clone_page);
 	unlock_page(clone_page);
 
 	return 0;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 709f672..4b2f38b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -856,6 +856,7 @@ int __set_page_dirty_no_writeback(struct page *page);
 int redirty_page_for_writepage(struct writeback_control *wbc,
 				struct page *page);
 void account_page_dirtied(struct page *page, struct address_space *mapping);
+void account_page_writeback(struct page *page);
 int set_page_dirty(struct page *page);
 int set_page_dirty_lock(struct page *page);
 int clear_page_dirty_for_io(struct page *page);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 9d07a8d..ae5f5d5 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1134,6 +1134,17 @@ void account_page_dirtied(struct page *page, struct address_space *mapping)
 EXPORT_SYMBOL(account_page_dirtied);
 
 /*
+ * Helper function for set_page_writeback family.
+ * NOTE: Unlike account_page_dirtied this does not rely on being atomic
+ * wrt interrupts.
+ */
+void account_page_writeback(struct page *page)
+{
+	inc_zone_page_state(page, NR_WRITEBACK);
+}
+EXPORT_SYMBOL(account_page_writeback);
+
+/*
  * For address_spaces which do not use buffers.  Just tag the page as dirty in
  * its radix tree.
  *
@@ -1371,7 +1382,7 @@ int test_set_page_writeback(struct page *page)
 		ret = TestSetPageWriteback(page);
 	}
 	if (!ret)
-		inc_zone_page_state(page, NR_WRITEBACK);
+		account_page_writeback(page);
 	return ret;
 
 }
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/4] writeback: nr_dirtied and nr_cleaned in /proc/vmstat
  2010-08-28  2:40 [PATCH 0/4] writeback: kernel visibility Michael Rubin
  2010-08-28  2:40 ` [PATCH 1/4] mm: exporting account_page_dirty Michael Rubin
  2010-08-28  2:40 ` [PATCH 2/4] mm: account_page_writeback added Michael Rubin
@ 2010-08-28  2:40 ` Michael Rubin
  2010-08-28 23:50   ` Wu Fengguang
  2010-08-28  2:40 ` [PATCH 4/4] writeback: Reporting dirty thresholds " Michael Rubin
  3 siblings, 1 reply; 19+ messages in thread
From: Michael Rubin @ 2010-08-28  2:40 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-mm
  Cc: fengguang.wu, jack, riel, akpm, david, kosaki.motohiro, npiggin,
	hch, axboe, Michael Rubin

To help developers and applications gain visibility into writeback
behaviour adding two entries to /proc/vmstat.

   # grep nr_dirtied /proc/vmstat
   nr_dirtied 3747
   # grep nr_cleaned /proc/vmstat
   nr_cleaned 3618

In order to track the "cleaned" and "dirtied" counts we added two
vm_stat_items. Per memory node stats have been added also. So we can
see per node granularity:

   # cat /sys/devices/system/node/node20/vmstat
   Node 20 pages_cleaned: 0 times
   Node 20 pages_dirtied: 0 times

Signed-off-by: Michael Rubin <mrubin@google.com>
---
 drivers/base/node.c    |   14 ++++++++++++++
 include/linux/mmzone.h |    2 ++
 mm/page-writeback.c    |    2 ++
 mm/vmstat.c            |    3 +++
 4 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 2872e86..facd920 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -160,6 +160,18 @@ static ssize_t node_read_numastat(struct sys_device * dev,
 }
 static SYSDEV_ATTR(numastat, S_IRUGO, node_read_numastat, NULL);
 
+static ssize_t node_read_vmstat(struct sys_device *dev,
+				struct sysdev_attribute *attr, char *buf)
+{
+	int nid = dev->id;
+	return sprintf(buf,
+		"Node %d pages_cleaned: %lu times\n"
+		"Node %d pages_dirtied: %lu times\n",
+		nid, node_page_state(nid, NR_PAGES_CLEANED),
+		nid, node_page_state(nid, NR_FILE_PAGES_DIRTIED));
+}
+static SYSDEV_ATTR(vmstat, S_IRUGO, node_read_vmstat, NULL);
+
 static ssize_t node_read_distance(struct sys_device * dev,
 			struct sysdev_attribute *attr, char * buf)
 {
@@ -243,6 +255,7 @@ int register_node(struct node *node, int num, struct node *parent)
 		sysdev_create_file(&node->sysdev, &attr_meminfo);
 		sysdev_create_file(&node->sysdev, &attr_numastat);
 		sysdev_create_file(&node->sysdev, &attr_distance);
+		sysdev_create_file(&node->sysdev, &attr_vmstat);
 
 		scan_unevictable_register_node(node);
 
@@ -267,6 +280,7 @@ void unregister_node(struct node *node)
 	sysdev_remove_file(&node->sysdev, &attr_meminfo);
 	sysdev_remove_file(&node->sysdev, &attr_numastat);
 	sysdev_remove_file(&node->sysdev, &attr_distance);
+	sysdev_remove_file(&node->sysdev, &attr_vmstat);
 
 	scan_unevictable_unregister_node(node);
 	hugetlb_unregister_node(node);		/* no-op, if memoryless node */
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 6e6e626..d42f179 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -104,6 +104,8 @@ enum zone_stat_item {
 	NR_ISOLATED_ANON,	/* Temporary isolated pages from anon lru */
 	NR_ISOLATED_FILE,	/* Temporary isolated pages from file lru */
 	NR_SHMEM,		/* shmem pages (included tmpfs/GEM pages) */
+	NR_FILE_PAGES_DIRTIED,	/* number of times pages get dirtied */
+	NR_PAGES_CLEANED,	/* number of times pages enter writeback */
 #ifdef CONFIG_NUMA
 	NUMA_HIT,		/* allocated in intended node */
 	NUMA_MISS,		/* allocated in non intended node */
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index ae5f5d5..19bb8c0 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1126,6 +1126,7 @@ void account_page_dirtied(struct page *page, struct address_space *mapping)
 {
 	if (mapping_cap_account_dirty(mapping)) {
 		__inc_zone_page_state(page, NR_FILE_DIRTY);
+		__inc_zone_page_state(page, NR_FILE_PAGES_DIRTIED);
 		__inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
 		task_dirty_inc(current);
 		task_io_account_write(PAGE_CACHE_SIZE);
@@ -1141,6 +1142,7 @@ EXPORT_SYMBOL(account_page_dirtied);
 void account_page_writeback(struct page *page)
 {
 	inc_zone_page_state(page, NR_WRITEBACK);
+	inc_zone_page_state(page, NR_PAGES_CLEANED);
 }
 EXPORT_SYMBOL(account_page_writeback);
 
diff --git a/mm/vmstat.c b/mm/vmstat.c
index f389168..8521475 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -732,6 +732,9 @@ static const char * const vmstat_text[] = {
 	"nr_isolated_anon",
 	"nr_isolated_file",
 	"nr_shmem",
+	"nr_dirtied",
+	"nr_cleaned",
+
 #ifdef CONFIG_NUMA
 	"numa_hit",
 	"numa_miss",
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat
  2010-08-28  2:40 [PATCH 0/4] writeback: kernel visibility Michael Rubin
                   ` (2 preceding siblings ...)
  2010-08-28  2:40 ` [PATCH 3/4] writeback: nr_dirtied and nr_cleaned in /proc/vmstat Michael Rubin
@ 2010-08-28  2:40 ` Michael Rubin
  2010-08-30  0:28   ` KOSAKI Motohiro
  3 siblings, 1 reply; 19+ messages in thread
From: Michael Rubin @ 2010-08-28  2:40 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-mm
  Cc: fengguang.wu, jack, riel, akpm, david, kosaki.motohiro, npiggin,
	hch, axboe, Michael Rubin

The kernel already exposes the user desired thresholds in /proc/sys/vm
with dirty_background_ratio and background_ratio. But the kernel may
alter the number requested without giving the user any indication that
is the case.

Knowing the actual ratios the kernel is honoring can help app developers
understand how their buffered IO will be sent to the disk.

        $ grep threshold /proc/vmstat
        nr_dirty_threshold 409111
        nr_dirty_background_threshold 818223

Signed-off-by: Michael Rubin <mrubin@google.com>
---
 include/linux/mmzone.h |    2 ++
 mm/vmstat.c            |    5 +++++
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index d42f179..ad48963 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -106,6 +106,8 @@ enum zone_stat_item {
 	NR_SHMEM,		/* shmem pages (included tmpfs/GEM pages) */
 	NR_FILE_PAGES_DIRTIED,	/* number of times pages get dirtied */
 	NR_PAGES_CLEANED,	/* number of times pages enter writeback */
+	NR_DIRTY_THRESHOLD,	/* writeback threshold */
+	NR_DIRTY_BG_THRESHOLD,	/* bg writeback threshold */
 #ifdef CONFIG_NUMA
 	NUMA_HIT,		/* allocated in intended node */
 	NUMA_MISS,		/* allocated in non intended node */
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 8521475..2342010 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -17,6 +17,7 @@
 #include <linux/vmstat.h>
 #include <linux/sched.h>
 #include <linux/math64.h>
+#include <linux/writeback.h>
 
 #ifdef CONFIG_VM_EVENT_COUNTERS
 DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
@@ -734,6 +735,8 @@ static const char * const vmstat_text[] = {
 	"nr_shmem",
 	"nr_dirtied",
 	"nr_cleaned",
+	"nr_dirty_threshold",
+	"nr_dirty_background_threshold",
 
 #ifdef CONFIG_NUMA
 	"numa_hit",
@@ -917,6 +920,8 @@ static void *vmstat_start(struct seq_file *m, loff_t *pos)
 		return ERR_PTR(-ENOMEM);
 	for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
 		v[i] = global_page_state(i);
+
+	global_dirty_limits(v + NR_DIRTY_BG_THRESHOLD, v + NR_DIRTY_THRESHOLD);
 #ifdef CONFIG_VM_EVENT_COUNTERS
 	e = v + NR_VM_ZONE_STAT_ITEMS;
 	all_vm_events(e);
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] mm: exporting account_page_dirty
  2010-08-28  2:40 ` [PATCH 1/4] mm: exporting account_page_dirty Michael Rubin
@ 2010-08-28 22:11   ` Sage Weil
  0 siblings, 0 replies; 19+ messages in thread
From: Sage Weil @ 2010-08-28 22:11 UTC (permalink / raw)
  To: Michael Rubin
  Cc: linux-kernel, linux-fsdevel, linux-mm, fengguang.wu, jack, riel,
	akpm, david, kosaki.motohiro, npiggin, hch, axboe

This one was just merged with the other Ceph stuff.

Thanks!
sage


On Fri, 27 Aug 2010, Michael Rubin wrote:

> This allows code outside of the mm core to safely manipulate page state
> and not worry about the other accounting. Not using these routines means
> that some code will lose track of the accounting and we get bugs. This
> has happened once already.
> 
> Modified cephs to use the interface.
> 
> Signed-off-by: Michael Rubin <mrubin@google.com>
> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
> 
> ---
>  fs/ceph/addr.c      |    8 +-------
>  mm/page-writeback.c |    1 +
>  2 files changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index 5598a0d..420d469 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -105,13 +105,7 @@ static int ceph_set_page_dirty(struct page *page)
>  	spin_lock_irq(&mapping->tree_lock);
>  	if (page->mapping) {	/* Race with truncate? */
>  		WARN_ON_ONCE(!PageUptodate(page));
> -
> -		if (mapping_cap_account_dirty(mapping)) {
> -			__inc_zone_page_state(page, NR_FILE_DIRTY);
> -			__inc_bdi_stat(mapping->backing_dev_info,
> -					BDI_RECLAIMABLE);
> -			task_io_account_write(PAGE_CACHE_SIZE);
> -		}
> +		account_page_dirtied(page, page->mapping);
>  		radix_tree_tag_set(&mapping->page_tree,
>  				page_index(page), PAGECACHE_TAG_DIRTY);
>  
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 7262aac..9d07a8d 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -1131,6 +1131,7 @@ void account_page_dirtied(struct page *page, struct address_space *mapping)
>  		task_io_account_write(PAGE_CACHE_SIZE);
>  	}
>  }
> +EXPORT_SYMBOL(account_page_dirtied);
>  
>  /*
>   * For address_spaces which do not use buffers.  Just tag the page as dirty in
> -- 
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] writeback: nr_dirtied and nr_cleaned in /proc/vmstat
  2010-08-28  2:40 ` [PATCH 3/4] writeback: nr_dirtied and nr_cleaned in /proc/vmstat Michael Rubin
@ 2010-08-28 23:50   ` Wu Fengguang
  2010-08-31  6:09     ` Michael Rubin
  0 siblings, 1 reply; 19+ messages in thread
From: Wu Fengguang @ 2010-08-28 23:50 UTC (permalink / raw)
  To: Michael Rubin
  Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, jack@suse.cz, riel@redhat.com,
	akpm@linux-foundation.org, david@fromorbit.com,
	kosaki.motohiro@jp.fujitsu.com, npiggin@kernel.dk, hch@lst.de,
	axboe@kernel.dk

On Sat, Aug 28, 2010 at 10:40:26AM +0800, Michael Rubin wrote:
> To help developers and applications gain visibility into writeback
> behaviour adding two entries to /proc/vmstat.
> 
>    # grep nr_dirtied /proc/vmstat
>    nr_dirtied 3747
>    # grep nr_cleaned /proc/vmstat
>    nr_cleaned 3618
> 
> In order to track the "cleaned" and "dirtied" counts we added two
> vm_stat_items. Per memory node stats have been added also. So we can
> see per node granularity:
> 
>    # cat /sys/devices/system/node/node20/vmstat
>    Node 20 pages_cleaned: 0 times
>    Node 20 pages_dirtied: 0 times

It's silly to have the different names nr_dirtied and pages_cleaned
for the same item.

> Signed-off-by: Michael Rubin <mrubin@google.com>
> ---
>  drivers/base/node.c    |   14 ++++++++++++++
>  include/linux/mmzone.h |    2 ++
>  mm/page-writeback.c    |    2 ++
>  mm/vmstat.c            |    3 +++
>  4 files changed, 21 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 2872e86..facd920 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -160,6 +160,18 @@ static ssize_t node_read_numastat(struct sys_device * dev,
>  }
>  static SYSDEV_ATTR(numastat, S_IRUGO, node_read_numastat, NULL);
>  
> +static ssize_t node_read_vmstat(struct sys_device *dev,
> +				struct sysdev_attribute *attr, char *buf)
> +{
> +	int nid = dev->id;
> +	return sprintf(buf,
> +		"Node %d pages_cleaned: %lu times\n"
> +		"Node %d pages_dirtied: %lu times\n",
> +		nid, node_page_state(nid, NR_PAGES_CLEANED),
> +		nid, node_page_state(nid, NR_FILE_PAGES_DIRTIED));
> +}

The output format is quite different from /proc/vmstat.
Do we really need to "Node X", ":" and "times" decorations?

And the "_PAGES" in NR_FILE_PAGES_DIRTIED looks redundant to
the "_page" in node_page_state(). It's a bit long to be a pleasant
name. NR_FILE_DIRTIED/NR_CLEANED looks nicer.

> +static SYSDEV_ATTR(vmstat, S_IRUGO, node_read_vmstat, NULL);
> +
>  static ssize_t node_read_distance(struct sys_device * dev,
>  			struct sysdev_attribute *attr, char * buf)
>  {
> @@ -243,6 +255,7 @@ int register_node(struct node *node, int num, struct node *parent)
>  		sysdev_create_file(&node->sysdev, &attr_meminfo);
>  		sysdev_create_file(&node->sysdev, &attr_numastat);
>  		sysdev_create_file(&node->sysdev, &attr_distance);
> +		sysdev_create_file(&node->sysdev, &attr_vmstat);
>  
>  		scan_unevictable_register_node(node);
>  
> @@ -267,6 +280,7 @@ void unregister_node(struct node *node)
>  	sysdev_remove_file(&node->sysdev, &attr_meminfo);
>  	sysdev_remove_file(&node->sysdev, &attr_numastat);
>  	sysdev_remove_file(&node->sysdev, &attr_distance);
> +	sysdev_remove_file(&node->sysdev, &attr_vmstat);
>  
>  	scan_unevictable_unregister_node(node);
>  	hugetlb_unregister_node(node);		/* no-op, if memoryless node */
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 6e6e626..d42f179 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -104,6 +104,8 @@ enum zone_stat_item {
>  	NR_ISOLATED_ANON,	/* Temporary isolated pages from anon lru */
>  	NR_ISOLATED_FILE,	/* Temporary isolated pages from file lru */
>  	NR_SHMEM,		/* shmem pages (included tmpfs/GEM pages) */
> +	NR_FILE_PAGES_DIRTIED,	/* number of times pages get dirtied */
> +	NR_PAGES_CLEANED,	/* number of times pages enter writeback */

How about the comments /* accumulated number of pages ... */?

Note that NR_CLEANED won't match NR_FILE_DIRTIED in long term because
it also accounts for anon pages, and does not account for dirty pages
that are truncated before they go writeback.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat
  2010-08-28  2:40 ` [PATCH 4/4] writeback: Reporting dirty thresholds " Michael Rubin
@ 2010-08-30  0:28   ` KOSAKI Motohiro
  2010-08-30 16:25     ` Michael Rubin
  0 siblings, 1 reply; 19+ messages in thread
From: KOSAKI Motohiro @ 2010-08-30  0:28 UTC (permalink / raw)
  To: Michael Rubin
  Cc: kosaki.motohiro, linux-kernel, linux-fsdevel, linux-mm,
	fengguang.wu, jack, riel, akpm, david, npiggin, hch, axboe

> The kernel already exposes the user desired thresholds in /proc/sys/vm
> with dirty_background_ratio and background_ratio. But the kernel may
> alter the number requested without giving the user any indication that
> is the case.
> 
> Knowing the actual ratios the kernel is honoring can help app developers
> understand how their buffered IO will be sent to the disk.
> 
>         $ grep threshold /proc/vmstat
>         nr_dirty_threshold 409111
>         nr_dirty_background_threshold 818223

?
afaict, you and wu agreed /debug/bdi/default/stats is enough good.
why do you change your mention?


> 
> Signed-off-by: Michael Rubin <mrubin@google.com>
> ---
>  include/linux/mmzone.h |    2 ++
>  mm/vmstat.c            |    5 +++++
>  2 files changed, 7 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index d42f179..ad48963 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -106,6 +106,8 @@ enum zone_stat_item {
>  	NR_SHMEM,		/* shmem pages (included tmpfs/GEM pages) */
>  	NR_FILE_PAGES_DIRTIED,	/* number of times pages get dirtied */
>  	NR_PAGES_CLEANED,	/* number of times pages enter writeback */
> +	NR_DIRTY_THRESHOLD,	/* writeback threshold */
> +	NR_DIRTY_BG_THRESHOLD,	/* bg writeback threshold */

Don't need this even though we add this two fields into /proc/sys/vm.
It can be calculated at displaing time.



>  #ifdef CONFIG_NUMA
>  	NUMA_HIT,		/* allocated in intended node */
>  	NUMA_MISS,		/* allocated in non intended node */
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 8521475..2342010 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -17,6 +17,7 @@
>  #include <linux/vmstat.h>
>  #include <linux/sched.h>
>  #include <linux/math64.h>
> +#include <linux/writeback.h>
>  
>  #ifdef CONFIG_VM_EVENT_COUNTERS
>  DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
> @@ -734,6 +735,8 @@ static const char * const vmstat_text[] = {
>  	"nr_shmem",
>  	"nr_dirtied",
>  	"nr_cleaned",
> +	"nr_dirty_threshold",
> +	"nr_dirty_background_threshold",
>  
>  #ifdef CONFIG_NUMA
>  	"numa_hit",
> @@ -917,6 +920,8 @@ static void *vmstat_start(struct seq_file *m, loff_t *pos)
>  		return ERR_PTR(-ENOMEM);
>  	for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
>  		v[i] = global_page_state(i);
> +
> +	global_dirty_limits(v + NR_DIRTY_BG_THRESHOLD, v + NR_DIRTY_THRESHOLD);
>  #ifdef CONFIG_VM_EVENT_COUNTERS
>  	e = v + NR_VM_ZONE_STAT_ITEMS;
>  	all_vm_events(e);
> -- 
> 1.7.1
> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat
  2010-08-30  0:28   ` KOSAKI Motohiro
@ 2010-08-30 16:25     ` Michael Rubin
  2010-08-31  1:07       ` KOSAKI Motohiro
  0 siblings, 1 reply; 19+ messages in thread
From: Michael Rubin @ 2010-08-30 16:25 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-kernel, linux-fsdevel, linux-mm, fengguang.wu, jack, riel,
	akpm, david, npiggin, hch, axboe

On Sun, Aug 29, 2010 at 5:28 PM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
> afaict, you and wu agreed /debug/bdi/default/stats is enough good.
> why do you change your mention?

I commented on this in the 0/4 email of the bug. I think these belong
in /proc/vmstat but I saw they exist in /debug/bdi/default/stats. I
figure they will probably not be accepted but I thought it was worth
attaching for consideration of upgrading from debugfs to /proc.

mrubin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat
  2010-08-30 16:25     ` Michael Rubin
@ 2010-08-31  1:07       ` KOSAKI Motohiro
  2010-08-31  1:32         ` Wu Fengguang
  0 siblings, 1 reply; 19+ messages in thread
From: KOSAKI Motohiro @ 2010-08-31  1:07 UTC (permalink / raw)
  To: Michael Rubin
  Cc: kosaki.motohiro, linux-kernel, linux-fsdevel, linux-mm,
	fengguang.wu, jack, riel, akpm, david, npiggin, hch, axboe

> On Sun, Aug 29, 2010 at 5:28 PM, KOSAKI Motohiro
> <kosaki.motohiro@jp.fujitsu.com> wrote:
> > afaict, you and wu agreed /debug/bdi/default/stats is enough good.
> > why do you change your mention?
> 
> I commented on this in the 0/4 email of the bug. I think these belong
> in /proc/vmstat but I saw they exist in /debug/bdi/default/stats. I
> figure they will probably not be accepted but I thought it was worth
> attaching for consideration of upgrading from debugfs to /proc.

For reviewers view, we are reviewing your patch to merge immediately if all issue are fixed.
Then, I'm unhappy if you don't drop merge blocker item even though you merely want asking.
At least, you can make separate thread, no?

Of cource, wen other user also want to expose via /proc interface, we are resume
this discusstion gradly.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat
  2010-08-31  1:07       ` KOSAKI Motohiro
@ 2010-08-31  1:32         ` Wu Fengguang
  0 siblings, 0 replies; 19+ messages in thread
From: Wu Fengguang @ 2010-08-31  1:32 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Michael Rubin, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, jack@suse.cz,
	riel@redhat.com, akpm@linux-foundation.org, david@fromorbit.com,
	npiggin@kernel.dk, hch@lst.de, axboe@kernel.dk

On Tue, Aug 31, 2010 at 09:07:32AM +0800, KOSAKI Motohiro wrote:
> > On Sun, Aug 29, 2010 at 5:28 PM, KOSAKI Motohiro
> > <kosaki.motohiro@jp.fujitsu.com> wrote:
> > > afaict, you and wu agreed /debug/bdi/default/stats is enough good.
> > > why do you change your mention?
> > 
> > I commented on this in the 0/4 email of the bug. I think these belong
> > in /proc/vmstat but I saw they exist in /debug/bdi/default/stats. I
> > figure they will probably not be accepted but I thought it was worth
> > attaching for consideration of upgrading from debugfs to /proc.
> 
> For reviewers view, we are reviewing your patch to merge immediately if all issue are fixed.
> Then, I'm unhappy if you don't drop merge blocker item even though you merely want asking.
> At least, you can make separate thread, no?
> 
> Of cource, wen other user also want to expose via /proc interface, we are resume
> this discusstion gradly.

Michael asked promoting the dirty thresholds from debugfs to /proc.
As a developer I'd interpret the question as: will there be enough
applications/admins using it? If not, we'd better keep it as debugfs.
Otherwise it benefits to do the interface promotion now, because it
will hurt to accumulate many end user dependencies on debugfs over
time..

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] writeback: nr_dirtied and nr_cleaned in /proc/vmstat
  2010-08-28 23:50   ` Wu Fengguang
@ 2010-08-31  6:09     ` Michael Rubin
  2010-08-31  7:48       ` Wu Fengguang
  0 siblings, 1 reply; 19+ messages in thread
From: Michael Rubin @ 2010-08-31  6:09 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, jack@suse.cz, riel@redhat.com,
	akpm@linux-foundation.org, david@fromorbit.com,
	kosaki.motohiro@jp.fujitsu.com, npiggin@kernel.dk, hch@lst.de,
	axboe@kernel.dk

On Sat, Aug 28, 2010 at 4:50 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> It's silly to have the different names nr_dirtied and pages_cleaned
> for the same item.

I agree. Will fix.

> The output format is quite different from /proc/vmstat.
> Do we really need to "Node X", ":" and "times" decorations?

Node X is based on the meminfo file but I agree it's redundant information.

> And the "_PAGES" in NR_FILE_PAGES_DIRTIED looks redundant to
> the "_page" in node_page_state(). It's a bit long to be a pleasant
> name. NR_FILE_DIRTIED/NR_CLEANED looks nicer.

Yeah. Will fix.


>> +static SYSDEV_ATTR(vmstat, S_IRUGO, node_read_vmstat, NULL);
>> +
>>  static ssize_t node_read_distance(struct sys_device * dev,
>>                       struct sysdev_attribute *attr, char * buf)
>>  {
>> @@ -243,6 +255,7 @@ int register_node(struct node *node, int num, struct node *parent)
>>               sysdev_create_file(&node->sysdev, &attr_meminfo);
>>               sysdev_create_file(&node->sysdev, &attr_numastat);
>>               sysdev_create_file(&node->sysdev, &attr_distance);
>> +             sysdev_create_file(&node->sysdev, &attr_vmstat);
>>
>>               scan_unevictable_register_node(node);
>>
>> @@ -267,6 +280,7 @@ void unregister_node(struct node *node)
>>       sysdev_remove_file(&node->sysdev, &attr_meminfo);
>>       sysdev_remove_file(&node->sysdev, &attr_numastat);
>>       sysdev_remove_file(&node->sysdev, &attr_distance);
>> +     sysdev_remove_file(&node->sysdev, &attr_vmstat);
>>
>>       scan_unevictable_unregister_node(node);
>>       hugetlb_unregister_node(node);          /* no-op, if memoryless node */
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 6e6e626..d42f179 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -104,6 +104,8 @@ enum zone_stat_item {
>>       NR_ISOLATED_ANON,       /* Temporary isolated pages from anon lru */
>>       NR_ISOLATED_FILE,       /* Temporary isolated pages from file lru */
>>       NR_SHMEM,               /* shmem pages (included tmpfs/GEM pages) */
>> +     NR_FILE_PAGES_DIRTIED,  /* number of times pages get dirtied */
>> +     NR_PAGES_CLEANED,       /* number of times pages enter writeback */
>
> How about the comments /* accumulated number of pages ... */?

OK.
May not get patch out today but will incorporate these fixes. Thank you again.

mrubin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] writeback: nr_dirtied and nr_cleaned in /proc/vmstat
  2010-08-31  6:09     ` Michael Rubin
@ 2010-08-31  7:48       ` Wu Fengguang
  2010-09-05 14:17         ` Wu Fengguang
  0 siblings, 1 reply; 19+ messages in thread
From: Wu Fengguang @ 2010-08-31  7:48 UTC (permalink / raw)
  To: Michael Rubin
  Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, jack@suse.cz, riel@redhat.com,
	akpm@linux-foundation.org, david@fromorbit.com,
	kosaki.motohiro@jp.fujitsu.com, npiggin@kernel.dk, hch@lst.de,
	axboe@kernel.dk

> > The output format is quite different from /proc/vmstat.
> > Do we really need to "Node X", ":" and "times" decorations?
> 
> Node X is based on the meminfo file but I agree it's redundant information.

Thanks. In the same directory you can find a different style example
/sys/devices/system/node/node0/numastat :) If ever the file was named
vmstat! In the other hand, shall we put the numbers there? I'm confused..

> > And the "_PAGES" in NR_FILE_PAGES_DIRTIED looks redundant to
> > the "_page" in node_page_state(). It's a bit long to be a pleasant
> > name. NR_FILE_DIRTIED/NR_CLEANED looks nicer.
> 
> Yeah. Will fix.

Thanks. This is kind of nitpick, however here is another name by
Jan Kara: BDI_WRITTEN. BDI_WRITTEN may not be a lot better than
BDI_CLEANED, but here is a patch based on Jan's code. I'm cooking
more patches that make use of this per-bdi counter to estimate the
bdi's write bandwidth, and to further decide the optimal (large)
writeback chunk size as well as to do IO-less balance_dirty_pages().

Basically BDI_WRITTEN and NR_CLEANED are accounting for the same
thing in different dimensions. So it would be good if we can use
the same naming scheme to avoid confusing users: either to use
BDI_WRITTEN and NR_WRITTEN, or use BDI_CLEANED and NR_CLEANED.
What's your opinion?

Thanks,
Fengguang

---
writeback: account per-bdi accumulated written pages

This steals code from Jan's balance_dirty_pages() patch.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 include/linux/backing-dev.h |    1 +
 mm/backing-dev.c            |    6 ++++--
 mm/page-writeback.c         |    1 +
 3 files changed, 6 insertions(+), 2 deletions(-)

--- linux-next.orig/include/linux/backing-dev.h	2010-08-29 23:32:46.000000000 +0800
+++ linux-next/include/linux/backing-dev.h	2010-08-29 23:52:50.000000000 +0800
@@ -40,6 +40,7 @@ typedef int (congested_fn)(void *, int);
 enum bdi_stat_item {
 	BDI_RECLAIMABLE,
 	BDI_WRITEBACK,
+	BDI_WRITTEN,
 	NR_BDI_STAT_ITEMS
 };
 
--- linux-next.orig/mm/backing-dev.c	2010-08-29 23:32:46.000000000 +0800
+++ linux-next/mm/backing-dev.c	2010-08-29 23:52:50.000000000 +0800
@@ -91,6 +91,7 @@ static int bdi_debug_stats_show(struct s
 		   "BdiDirtyThresh:   %8lu kB\n"
 		   "DirtyThresh:      %8lu kB\n"
 		   "BackgroundThresh: %8lu kB\n"
+		   "BdiWritten:       %8lu kB\n"
 		   "b_dirty:          %8lu\n"
 		   "b_io:             %8lu\n"
 		   "b_more_io:        %8lu\n"
@@ -98,8 +99,9 @@ static int bdi_debug_stats_show(struct s
 		   "state:            %8lx\n",
 		   (unsigned long) K(bdi_stat(bdi, BDI_WRITEBACK)),
 		   (unsigned long) K(bdi_stat(bdi, BDI_RECLAIMABLE)),
-		   K(bdi_thresh), K(dirty_thresh),
-		   K(background_thresh), nr_dirty, nr_io, nr_more_io,
+		   K(bdi_thresh), K(dirty_thresh), K(background_thresh),
+		   (unsigned long) K(bdi_stat(bdi, BDI_WRITTEN)),
+		   nr_dirty, nr_io, nr_more_io,
 		   !list_empty(&bdi->bdi_list), bdi->state);
 #undef K
 
--- linux-next.orig/mm/page-writeback.c	2010-08-29 23:52:47.000000000 +0800
+++ linux-next/mm/page-writeback.c	2010-08-29 23:52:50.000000000 +0800
@@ -1305,6 +1305,7 @@ int test_clear_page_writeback(struct pag
 						PAGECACHE_TAG_WRITEBACK);
 			if (bdi_cap_account_writeback(bdi)) {
 				__dec_bdi_stat(bdi, BDI_WRITEBACK);
+				__inc_bdi_stat(bdi, BDI_WRITTEN);
 				__bdi_writeout_inc(bdi);
 			}
 		}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] writeback: nr_dirtied and nr_cleaned in /proc/vmstat
  2010-08-31  7:48       ` Wu Fengguang
@ 2010-09-05 14:17         ` Wu Fengguang
  2010-09-06  1:00           ` KOSAKI Motohiro
  0 siblings, 1 reply; 19+ messages in thread
From: Wu Fengguang @ 2010-09-05 14:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michael Rubin, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, jack@suse.cz,
	riel@redhat.com, david@fromorbit.com,
	kosaki.motohiro@jp.fujitsu.com, npiggin@kernel.dk, hch@lst.de,
	axboe@kernel.dk

On Tue, Aug 31, 2010 at 03:48:25PM +0800, Wu Fengguang wrote:
> > > The output format is quite different from /proc/vmstat.
> > > Do we really need to "Node X", ":" and "times" decorations?
> > 
> > Node X is based on the meminfo file but I agree it's redundant information.
> 
> Thanks. In the same directory you can find a different style example
> /sys/devices/system/node/node0/numastat :) If ever the file was named
> vmstat! In the other hand, shall we put the numbers there? I'm confused..

With wider use of NUMA, I'm expecting more interests to put
/proc/vmstat items into /sys/devices/system/node/node0/.

What shall we do then? There are several possible options:
- just put the /proc/vmstat items into nodeX/numastat
- create nodeX/vmstat and make numastat a symlink to vmstat
- create nodeX/vmstat and remove numastat in future

Any suggestions?

> > > And the "_PAGES" in NR_FILE_PAGES_DIRTIED looks redundant to
> > > the "_page" in node_page_state(). It's a bit long to be a pleasant
> > > name. NR_FILE_DIRTIED/NR_CLEANED looks nicer.
> > 
> > Yeah. Will fix.
> 
> Thanks. This is kind of nitpick, however here is another name by
> Jan Kara: BDI_WRITTEN. BDI_WRITTEN may not be a lot better than
> BDI_CLEANED, but here is a patch based on Jan's code. I'm cooking
> more patches that make use of this per-bdi counter to estimate the
> bdi's write bandwidth, and to further decide the optimal (large)
> writeback chunk size as well as to do IO-less balance_dirty_pages().
> 
> Basically BDI_WRITTEN and NR_CLEANED are accounting for the same
> thing in different dimensions. So it would be good if we can use
> the same naming scheme to avoid confusing users: either to use
> BDI_WRITTEN and NR_WRITTEN, or use BDI_CLEANED and NR_CLEANED.
> What's your opinion?

I tend to prefer *_WRITTEN now.
- *_WRITTEN reminds the users about IO, *_CLEANED is less so obvious.
- *_CLEANED seems to be paired with NR_DIRTIED, this could be
  misleading to the users. The fact is, dirty pages may either be
  written to disk, or dropped (by truncate).

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] writeback: nr_dirtied and nr_cleaned in /proc/vmstat
  2010-09-05 14:17         ` Wu Fengguang
@ 2010-09-06  1:00           ` KOSAKI Motohiro
  2010-09-06  1:34             ` Wu Fengguang
  0 siblings, 1 reply; 19+ messages in thread
From: KOSAKI Motohiro @ 2010-09-06  1:00 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: kosaki.motohiro, Andrew Morton, Michael Rubin,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, jack@suse.cz, riel@redhat.com,
	david@fromorbit.com, npiggin@kernel.dk, hch@lst.de,
	axboe@kernel.dk

> On Tue, Aug 31, 2010 at 03:48:25PM +0800, Wu Fengguang wrote:
> > > > The output format is quite different from /proc/vmstat.
> > > > Do we really need to "Node X", ":" and "times" decorations?
> > > 
> > > Node X is based on the meminfo file but I agree it's redundant information.
> > 
> > Thanks. In the same directory you can find a different style example
> > /sys/devices/system/node/node0/numastat :) If ever the file was named
> > vmstat! In the other hand, shall we put the numbers there? I'm confused..
> 
> With wider use of NUMA, I'm expecting more interests to put
> /proc/vmstat items into /sys/devices/system/node/node0/.

I prefer to create /sys/devices/system/node/node0/zones/zone-DMA32/vmstat
because the VM is managing pages as per-zones.
but /sys/devices/system/node/node0/vmstat is also useful.


> 
> What shall we do then? There are several possible options:
> - just put the /proc/vmstat items into nodeX/numastat
> - create nodeX/vmstat and make numastat a symlink to vmstat
> - create nodeX/vmstat and remove numastat in future
> 
> Any suggestions?


I like 3rd option :)
In addition, I doubt we really need to remove numastat. It's not
so harmful.



> 
> > > > And the "_PAGES" in NR_FILE_PAGES_DIRTIED looks redundant to
> > > > the "_page" in node_page_state(). It's a bit long to be a pleasant
> > > > name. NR_FILE_DIRTIED/NR_CLEANED looks nicer.
> > > 
> > > Yeah. Will fix.
> > 
> > Thanks. This is kind of nitpick, however here is another name by
> > Jan Kara: BDI_WRITTEN. BDI_WRITTEN may not be a lot better than
> > BDI_CLEANED, but here is a patch based on Jan's code. I'm cooking
> > more patches that make use of this per-bdi counter to estimate the
> > bdi's write bandwidth, and to further decide the optimal (large)
> > writeback chunk size as well as to do IO-less balance_dirty_pages().
> > 
> > Basically BDI_WRITTEN and NR_CLEANED are accounting for the same
> > thing in different dimensions. So it would be good if we can use
> > the same naming scheme to avoid confusing users: either to use
> > BDI_WRITTEN and NR_WRITTEN, or use BDI_CLEANED and NR_CLEANED.
> > What's your opinion?
> 
> I tend to prefer *_WRITTEN now.
> - *_WRITTEN reminds the users about IO, *_CLEANED is less so obvious.
> - *_CLEANED seems to be paired with NR_DIRTIED, this could be
>   misleading to the users. The fact is, dirty pages may either be
>   written to disk, or dropped (by truncate).

Umm...
If my understanding is correct, Michael really need *_CLEANED because
he want to compare NR_DIRTIED and *_CLEANED. That said, we need to
change counter implementation itself instead a name?



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] writeback: nr_dirtied and nr_cleaned in /proc/vmstat
  2010-09-06  1:00           ` KOSAKI Motohiro
@ 2010-09-06  1:34             ` Wu Fengguang
  0 siblings, 0 replies; 19+ messages in thread
From: Wu Fengguang @ 2010-09-06  1:34 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Michael Rubin, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, jack@suse.cz,
	riel@redhat.com, david@fromorbit.com, npiggin@kernel.dk,
	hch@lst.de, axboe@kernel.dk

On Mon, Sep 06, 2010 at 09:00:35AM +0800, KOSAKI Motohiro wrote:
> > On Tue, Aug 31, 2010 at 03:48:25PM +0800, Wu Fengguang wrote:
> > > > > The output format is quite different from /proc/vmstat.
> > > > > Do we really need to "Node X", ":" and "times" decorations?
> > > > 
> > > > Node X is based on the meminfo file but I agree it's redundant information.
> > > 
> > > Thanks. In the same directory you can find a different style example
> > > /sys/devices/system/node/node0/numastat :) If ever the file was named
> > > vmstat! In the other hand, shall we put the numbers there? I'm confused..
> > 
> > With wider use of NUMA, I'm expecting more interests to put
> > /proc/vmstat items into /sys/devices/system/node/node0/.
> 
> I prefer to create /sys/devices/system/node/node0/zones/zone-DMA32/vmstat
> because the VM is managing pages as per-zones.
> but /sys/devices/system/node/node0/vmstat is also useful.

Good points.

> > What shall we do then? There are several possible options:
> > - just put the /proc/vmstat items into nodeX/numastat
> > - create nodeX/vmstat and make numastat a symlink to vmstat
> > - create nodeX/vmstat and remove numastat in future
> > 
> > Any suggestions?
> 
> 
> I like 3rd option :)
> In addition, I doubt we really need to remove numastat. It's not
> so harmful.

Yeah 4th option: keep numastat while introducing the above interfaces.
The contents might be duplicated, but not a big issue.

> > 
> > > > > And the "_PAGES" in NR_FILE_PAGES_DIRTIED looks redundant to
> > > > > the "_page" in node_page_state(). It's a bit long to be a pleasant
> > > > > name. NR_FILE_DIRTIED/NR_CLEANED looks nicer.
> > > > 
> > > > Yeah. Will fix.
> > > 
> > > Thanks. This is kind of nitpick, however here is another name by
> > > Jan Kara: BDI_WRITTEN. BDI_WRITTEN may not be a lot better than
> > > BDI_CLEANED, but here is a patch based on Jan's code. I'm cooking
> > > more patches that make use of this per-bdi counter to estimate the
> > > bdi's write bandwidth, and to further decide the optimal (large)
> > > writeback chunk size as well as to do IO-less balance_dirty_pages().
> > > 
> > > Basically BDI_WRITTEN and NR_CLEANED are accounting for the same
> > > thing in different dimensions. So it would be good if we can use
> > > the same naming scheme to avoid confusing users: either to use
> > > BDI_WRITTEN and NR_WRITTEN, or use BDI_CLEANED and NR_CLEANED.
> > > What's your opinion?
> > 
> > I tend to prefer *_WRITTEN now.
> > - *_WRITTEN reminds the users about IO, *_CLEANED is less so obvious.
> > - *_CLEANED seems to be paired with NR_DIRTIED, this could be
> >   misleading to the users. The fact is, dirty pages may either be
> >   written to disk, or dropped (by truncate).
> 
> Umm...
> If my understanding is correct, Michael really need *_CLEANED because
> he want to compare NR_DIRTIED and *_CLEANED. That said, we need to
> change counter implementation itself instead a name?

It's only about naming :) Michael want to do per-zone accounting and I
also need to do per-bdi accounting, for basically the same event. So
I'm proposing to name it consistently.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2010-09-06  1:34 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-28  2:40 [PATCH 0/4] writeback: kernel visibility Michael Rubin
2010-08-28  2:40 ` [PATCH 1/4] mm: exporting account_page_dirty Michael Rubin
2010-08-28 22:11   ` Sage Weil
2010-08-28  2:40 ` [PATCH 2/4] mm: account_page_writeback added Michael Rubin
2010-08-28  2:40 ` [PATCH 3/4] writeback: nr_dirtied and nr_cleaned in /proc/vmstat Michael Rubin
2010-08-28 23:50   ` Wu Fengguang
2010-08-31  6:09     ` Michael Rubin
2010-08-31  7:48       ` Wu Fengguang
2010-09-05 14:17         ` Wu Fengguang
2010-09-06  1:00           ` KOSAKI Motohiro
2010-09-06  1:34             ` Wu Fengguang
2010-08-28  2:40 ` [PATCH 4/4] writeback: Reporting dirty thresholds " Michael Rubin
2010-08-30  0:28   ` KOSAKI Motohiro
2010-08-30 16:25     ` Michael Rubin
2010-08-31  1:07       ` KOSAKI Motohiro
2010-08-31  1:32         ` Wu Fengguang
  -- strict thread matches above, loose matches on Subject: below --
2010-08-20  9:31 [PATCH 0/4] writeback: kernel visibility Michael Rubin
2010-08-20  9:31 ` [PATCH 2/4] mm: account_page_writeback added Michael Rubin
2010-08-20  9:45   ` Wu Fengguang
2010-08-20 10:08     ` KOSAKI Motohiro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).