[PATCH] bdi flusher should not be throttled here when it fall into buddy slow path

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] bdi flusher should not be throttled here when it fall into buddy slow path
@ 2016-10-18  7:12 zhouxianrong
  2016-10-18  9:34 ` Hillf Danton
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: zhouxianrong @ 2016-10-18  7:12 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, akpm, viro, mingo, peterz, hannes, mgorman, vbabka,
	mhocko, vdavydov.dev, minchan, riel, zhouxianrong, zhouxiyu,
	zhangshiming5, won.ho.park, tuxiaobing

From: z00281421 <z00281421@notesmail.huawei.com>

bdi flusher may enter page alloc slow path due to writepage and kmalloc. 
in that case the flusher as a direct reclaimer should not be throttled here
because it can not to reclaim clean file pages or anaonymous pages
for next moment; furthermore writeback rate of dirty pages would be
slow down and other direct reclaimers and kswapd would be affected.
bdi flusher should be iosceduled by get_request rather than here.

Signed-off-by: z00281421 <z00281421@notesmail.huawei.com>
---
 fs/fs-writeback.c     |    4 ++--
 include/linux/sched.h |    1 +
 mm/vmscan.c           |   15 +++++++++++----
 3 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 05713a5..f6bf067 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1908,7 +1908,7 @@ void wb_workfn(struct work_struct *work)
 	long pages_written;
 
 	set_worker_desc("flush-%s", dev_name(wb->bdi->dev));
-	current->flags |= PF_SWAPWRITE;
+	current->flags |= (PF_SWAPWRITE | PF_BDI_FLUSHER | PF_LESS_THROTTLE);
 
 	if (likely(!current_is_workqueue_rescuer() ||
 		   !test_bit(WB_registered, &wb->state))) {
@@ -1938,7 +1938,7 @@ void wb_workfn(struct work_struct *work)
 	else if (wb_has_dirty_io(wb) && dirty_writeback_interval)
 		wb_wakeup_delayed(wb);
 
-	current->flags &= ~PF_SWAPWRITE;
+	current->flags &= ~(PF_SWAPWRITE | PF_BDI_FLUSHER | PF_LESS_THROTTLE);
 }
 
 /*
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 62c68e5..4bb70f2 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2232,6 +2232,7 @@ extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut,
 #define PF_KTHREAD	0x00200000	/* I am a kernel thread */
 #define PF_RANDOMIZE	0x00400000	/* randomize virtual address space */
 #define PF_SWAPWRITE	0x00800000	/* Allowed to write to swap */
+#define PF_BDI_FLUSHER  0x01000000	/* I am bdi flusher */
 #define PF_NO_SETAFFINITY 0x04000000	/* Userland is not allowed to meddle with cpus_allowed */
 #define PF_MCE_EARLY    0x08000000      /* Early kill for mce process policy */
 #define PF_MUTEX_TESTER	0x20000000	/* Thread belongs to the rt mutex tester */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 0fe8b71..492e9e7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1643,12 +1643,19 @@ putback_inactive_pages(struct lruvec *lruvec, struct list_head *page_list)
  * If a kernel thread (such as nfsd for loop-back mounts) services
  * a backing device by writing to the page cache it sets PF_LESS_THROTTLE.
  * In that case we should only throttle if the backing device it is
- * writing to is congested.  In other cases it is safe to throttle.
+ * writing to is congested.  another case is that bdi flusher could
+ * not be throttled here even though whose bdi is consgested.
+ * In other cases it is safe to throttle.
  */
-static int current_may_throttle(void)
+static bool current_may_throttle(void)
 {
-	return !(current->flags & PF_LESS_THROTTLE) ||
-		current->backing_dev_info == NULL ||
+	if (!(current->flags & PF_LESS_THROTTLE))
+		return true;
+
+	if (current->flags & PF_BDI_FLUSHER)
+		return false;
+
+	return current->backing_dev_info == NULL ||
 		bdi_write_congested(current->backing_dev_info);
 }
 
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] bdi flusher should not be throttled here when it fall into buddy slow path
  2016-10-18  7:12 [PATCH] bdi flusher should not be throttled here when it fall into buddy slow path zhouxianrong
@ 2016-10-18  9:34 ` Hillf Danton
  2016-10-18  9:59 ` Mel Gorman
  2016-10-20 12:38 ` zhouxianrong
  2 siblings, 0 replies; 8+ messages in thread
From: Hillf Danton @ 2016-10-18  9:34 UTC (permalink / raw)
  To: zhouxianrong, linux-mm
  Cc: linux-kernel, akpm, viro, mingo, peterz, hannes, mgorman, vbabka,
	mhocko, vdavydov.dev, minchan, riel, zhouxiyu, zhangshiming5,
	won.ho.park, tuxiaobing

> @@ -1908,7 +1908,7 @@ void wb_workfn(struct work_struct *work)
>  	long pages_written;
> 
>  	set_worker_desc("flush-%s", dev_name(wb->bdi->dev));
> -	current->flags |= PF_SWAPWRITE;

If flags carries PF_LESS_THROTTLE before modified, then you 
have to restore it. 

> +	current->flags |= (PF_SWAPWRITE | PF_BDI_FLUSHER | PF_LESS_THROTTLE);
> 
>  	if (likely(!current_is_workqueue_rescuer() ||
>  		   !test_bit(WB_registered, &wb->state))) {
> @@ -1938,7 +1938,7 @@ void wb_workfn(struct work_struct *work)
>  	else if (wb_has_dirty_io(wb) && dirty_writeback_interval)
>  		wb_wakeup_delayed(wb);
> 
> -	current->flags &= ~PF_SWAPWRITE;
> +	current->flags &= ~(PF_SWAPWRITE | PF_BDI_FLUSHER | PF_LESS_THROTTLE);
>  }
> 
thanks
Hillf

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] bdi flusher should not be throttled here when it fall into buddy slow path
  2016-10-18  7:12 [PATCH] bdi flusher should not be throttled here when it fall into buddy slow path zhouxianrong
  2016-10-18  9:34 ` Hillf Danton
@ 2016-10-18  9:59 ` Mel Gorman
  2016-10-18 11:08   ` zhouxianrong
  2016-10-20 12:38 ` zhouxianrong
  2 siblings, 1 reply; 8+ messages in thread
From: Mel Gorman @ 2016-10-18  9:59 UTC (permalink / raw)
  To: zhouxianrong
  Cc: linux-mm, linux-kernel, akpm, viro, mingo, peterz, hannes, vbabka,
	mhocko, vdavydov.dev, minchan, riel, zhouxiyu, zhangshiming5,
	won.ho.park, tuxiaobing

On Tue, Oct 18, 2016 at 03:12:45PM +0800, zhouxianrong@huawei.com wrote:
> From: z00281421 <z00281421@notesmail.huawei.com>
> 
> bdi flusher may enter page alloc slow path due to writepage and kmalloc. 
> in that case the flusher as a direct reclaimer should not be throttled here
> because it can not to reclaim clean file pages or anaonymous pages
> for next moment; furthermore writeback rate of dirty pages would be
> slow down and other direct reclaimers and kswapd would be affected.
> bdi flusher should be iosceduled by get_request rather than here.
> 
> Signed-off-by: z00281421 <z00281421@notesmail.huawei.com>

What does this patch do that PF_LESS_THROTTLE is not doing already if
there is an underlying BDI?

There have been a few patches like this recently that look like they might
do something useful but are subtle. They really should be accompanied by
a test case and data showing they either fix a functional issue (machine
livelocking due to writeback not making progress) or a performance issue.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] bdi flusher should not be throttled here when it fall into buddy slow path
  2016-10-18  9:59 ` Mel Gorman
@ 2016-10-18 11:08   ` zhouxianrong
  2016-10-18 11:42     ` Michal Hocko
  0 siblings, 1 reply; 8+ messages in thread
From: zhouxianrong @ 2016-10-18 11:08 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, linux-kernel, akpm, viro, mingo, peterz, hannes, vbabka,
	mhocko, vdavydov.dev, minchan, riel, zhouxiyu, zhangshiming5,
	won.ho.park, tuxiaobing

Call trace:
[<ffffffc0000863dc>] __switch_to+0x80/0x98
[<ffffffc001160c58>] __schedule+0x314/0x854
[<ffffffc0011611e0>] schedule+0x48/0xa4
[<ffffffc0011648c4>] schedule_timeout+0x158/0x2c8
[<ffffffc0011608b4>] io_schedule_timeout+0xbc/0x14c
[<ffffffc0001aec84>] wait_iff_congested+0x1d4/0x1ec
[<ffffffc0001a36b0>] shrink_inactive_list+0x530/0x760
[<ffffffc0001a3e14>] shrink_lruvec+0x534/0x76c
[<ffffffc0001a40d4>] shrink_zone+0x88/0x1b8
[<ffffffc0001a4444>] do_try_to_free_pages+0x240/0x478
[<ffffffc0001a4788>] try_to_free_pages+0x10c/0x284
[<ffffffc0001968a4>] __alloc_pages_nodemask+0x540/0x918
[<ffffffc0001dd0e8>] new_slab+0x334/0x4a0
[<ffffffc0001df37c>] __slab_alloc.isra.75.constprop.77+0x6bc/0x780
[<ffffffc0001df584>] kmem_cache_alloc+0x144/0x23c
[<ffffffc00018f040>] mempool_alloc_slab+0x2c/0x38
[<ffffffc00018f1f4>] mempool_alloc+0x7c/0x188
[<ffffffc0003f462c>] bio_alloc_bioset+0x1cc/0x254
[<ffffffc00022a430>] _submit_bh+0x74/0x1c8
[<ffffffc00022c9d0>] __block_write_full_page.constprop.33+0x1a0/0x40c
[<ffffffc00022cd1c>] block_write_full_page+0xe0/0x134
[<ffffffc00022da64>] blkdev_writepage+0x30/0x3c
[<ffffffc000197d08>] __writepage+0x34/0x74
[<ffffffc000198880>] write_cache_pages+0x1e8/0x450
[<ffffffc000198b3c>] generic_writepages+0x54/0x8c
[<ffffffc00019a990>] do_writepages+0x40/0x6c
[<ffffffc00021e604>] __writeback_single_inode+0x60/0x51c
[<ffffffc00021eeec>] writeback_sb_inodes+0x2d4/0x46c
[<ffffffc00021f128>] __writeback_inodes_wb+0xa4/0xe8
[<ffffffc00021f480>] wb_writeback+0x314/0x3fc
[<ffffffc000220224>] bdi_writeback_workfn+0x130/0x4e0
[<ffffffc0000be4d4>] process_one_work+0x18c/0x51c
[<ffffffc0000bedd8>] worker_thread+0x15c/0x51c
[<ffffffc0000c5718>] kthread+0x10c/0x120

the above calltrace occured when write sdcard under large and long pressure.
the patch is a performance issue. i hope flusher do not be throttled just here and
let it reclaim the successive clean file pages or anonymous pages on lru list
and then return to write left dirty pages of inode. it would speed up write-back
speed of dirty pages. so other direct reclaimers can reclaim more clean pages.
in low memory caused by big pagecache bdi writeback speed play a key role.


On 2016/10/18 17:59, Mel Gorman wrote:
> On Tue, Oct 18, 2016 at 03:12:45PM +0800, zhouxianrong@huawei.com wrote:
>> From: z00281421 <z00281421@notesmail.huawei.com>
>>
>> bdi flusher may enter page alloc slow path due to writepage and kmalloc.
>> in that case the flusher as a direct reclaimer should not be throttled here
>> because it can not to reclaim clean file pages or anaonymous pages
>> for next moment; furthermore writeback rate of dirty pages would be
>> slow down and other direct reclaimers and kswapd would be affected.
>> bdi flusher should be iosceduled by get_request rather than here.
>>
>> Signed-off-by: z00281421 <z00281421@notesmail.huawei.com>
>
> What does this patch do that PF_LESS_THROTTLE is not doing already if
> there is an underlying BDI?
>
> There have been a few patches like this recently that look like they might
> do something useful but are subtle. They really should be accompanied by
> a test case and data showing they either fix a functional issue (machine
> livelocking due to writeback not making progress) or a performance issue.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] bdi flusher should not be throttled here when it fall into buddy slow path
  2016-10-18 11:08   ` zhouxianrong
@ 2016-10-18 11:42     ` Michal Hocko
  0 siblings, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2016-10-18 11:42 UTC (permalink / raw)
  To: zhouxianrong
  Cc: Mel Gorman, linux-mm, linux-kernel, akpm, viro, mingo, peterz,
	hannes, vbabka, vdavydov.dev, minchan, riel, zhouxiyu,
	zhangshiming5, won.ho.park, tuxiaobing

On Tue 18-10-16 19:08:05, zhouxianrong wrote:
> Call trace:
> [<ffffffc0000863dc>] __switch_to+0x80/0x98
> [<ffffffc001160c58>] __schedule+0x314/0x854
> [<ffffffc0011611e0>] schedule+0x48/0xa4
> [<ffffffc0011648c4>] schedule_timeout+0x158/0x2c8
> [<ffffffc0011608b4>] io_schedule_timeout+0xbc/0x14c
> [<ffffffc0001aec84>] wait_iff_congested+0x1d4/0x1ec
> [<ffffffc0001a36b0>] shrink_inactive_list+0x530/0x760
> [<ffffffc0001a3e14>] shrink_lruvec+0x534/0x76c
> [<ffffffc0001a40d4>] shrink_zone+0x88/0x1b8
> [<ffffffc0001a4444>] do_try_to_free_pages+0x240/0x478
> [<ffffffc0001a4788>] try_to_free_pages+0x10c/0x284
> [<ffffffc0001968a4>] __alloc_pages_nodemask+0x540/0x918
> [<ffffffc0001dd0e8>] new_slab+0x334/0x4a0
> [<ffffffc0001df37c>] __slab_alloc.isra.75.constprop.77+0x6bc/0x780
> [<ffffffc0001df584>] kmem_cache_alloc+0x144/0x23c
> [<ffffffc00018f040>] mempool_alloc_slab+0x2c/0x38
> [<ffffffc00018f1f4>] mempool_alloc+0x7c/0x188
> [<ffffffc0003f462c>] bio_alloc_bioset+0x1cc/0x254
> [<ffffffc00022a430>] _submit_bh+0x74/0x1c8
> [<ffffffc00022c9d0>] __block_write_full_page.constprop.33+0x1a0/0x40c
> [<ffffffc00022cd1c>] block_write_full_page+0xe0/0x134
> [<ffffffc00022da64>] blkdev_writepage+0x30/0x3c
> [<ffffffc000197d08>] __writepage+0x34/0x74
> [<ffffffc000198880>] write_cache_pages+0x1e8/0x450
> [<ffffffc000198b3c>] generic_writepages+0x54/0x8c
> [<ffffffc00019a990>] do_writepages+0x40/0x6c
> [<ffffffc00021e604>] __writeback_single_inode+0x60/0x51c
> [<ffffffc00021eeec>] writeback_sb_inodes+0x2d4/0x46c
> [<ffffffc00021f128>] __writeback_inodes_wb+0xa4/0xe8
> [<ffffffc00021f480>] wb_writeback+0x314/0x3fc
> [<ffffffc000220224>] bdi_writeback_workfn+0x130/0x4e0
> [<ffffffc0000be4d4>] process_one_work+0x18c/0x51c
> [<ffffffc0000bedd8>] worker_thread+0x15c/0x51c
> [<ffffffc0000c5718>] kthread+0x10c/0x120
> 
> the above calltrace occured when write sdcard under large and long pressure.
> the patch is a performance issue. i hope flusher do not be throttled just here and
> let it reclaim the successive clean file pages or anonymous pages on lru list
> and then return to write left dirty pages of inode. it would speed up write-back
> speed of dirty pages. so other direct reclaimers can reclaim more clean pages.
> in low memory caused by big pagecache bdi writeback speed play a key role.

If we got here then we are hitting into dirty/writeback pages on the
tail of the LRU list and the bdi is congested. So there are no clean
pages most probably and the storage doesn't catch up with that IO.

Why do you think that not throttling would help here? Do you really see
that the further reclaim really makes forward progress or it just wastes
more CPU without doing a useful work?

In other words much more information please!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] bdi flusher should not be throttled here when it fall into buddy slow path
  2016-10-18  7:12 [PATCH] bdi flusher should not be throttled here when it fall into buddy slow path zhouxianrong
  2016-10-18  9:34 ` Hillf Danton
  2016-10-18  9:59 ` Mel Gorman
@ 2016-10-20 12:38 ` zhouxianrong
  2016-10-20 13:05   ` Mika Penttilä
  2016-10-20 13:28   ` Michal Hocko
  2 siblings, 2 replies; 8+ messages in thread
From: zhouxianrong @ 2016-10-20 12:38 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, akpm, viro, mingo, peterz, hannes, mgorman, vbabka,
	mhocko, vdavydov.dev, minchan, riel, zhouxianrong, zhouxiyu,
	zhangshiming5, won.ho.park, tuxiaobing

From: z00281421 <z00281421@notesmail.huawei.com>

The bdi flusher should be throttled only depends on 
own bdi and is decoupled with others.

separate PGDAT_WRITEBACK into PGDAT_ANON_WRITEBACK and
PGDAT_FILE_WRITEBACK avoid scanning anon lru and it is ok 
then throttled on file WRITEBACK.

i think above may be not right.

Signed-off-by: z00281421 <z00281421@notesmail.huawei.com>
---
 fs/fs-writeback.c      |    8 ++++++--
 include/linux/mmzone.h |    7 +++++--
 mm/vmscan.c            |   20 ++++++++++++--------
 3 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 05713a5..ddcc70f 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1905,10 +1905,13 @@ void wb_workfn(struct work_struct *work)
 {
 	struct bdi_writeback *wb = container_of(to_delayed_work(work),
 						struct bdi_writeback, dwork);
+	struct backing_dev_info *bdi = container_of(to_delayed_work(work),
+						struct backing_dev_info, wb.dwork);
 	long pages_written;
 
 	set_worker_desc("flush-%s", dev_name(wb->bdi->dev));
-	current->flags |= PF_SWAPWRITE;
+	current->flags |= (PF_SWAPWRITE | PF_LESS_THROTTLE);
+	current->bdi = bdi;
 
 	if (likely(!current_is_workqueue_rescuer() ||
 		   !test_bit(WB_registered, &wb->state))) {
@@ -1938,7 +1941,8 @@ void wb_workfn(struct work_struct *work)
 	else if (wb_has_dirty_io(wb) && dirty_writeback_interval)
 		wb_wakeup_delayed(wb);
 
-	current->flags &= ~PF_SWAPWRITE;
+	current->bdi = NULL;
+	current->flags &= ~(PF_SWAPWRITE | PF_LESS_THROTTLE);
 }
 
 /*
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 7f2ae99..fa602e9 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -528,8 +528,11 @@ enum pgdat_flags {
 					 * many dirty file pages at the tail
 					 * of the LRU.
 					 */
-	PGDAT_WRITEBACK,		/* reclaim scanning has recently found
-					 * many pages under writeback
+	PGDAT_ANON_WRITEBACK,		/* reclaim scanning has recently found
+					 * many anonymous pages under writeback
+					 */
+	PGDAT_FILE_WRITEBACK,		/* reclaim scanning has recently found
+					 * many file pages under writeback
 					 */
 	PGDAT_RECLAIM_LOCKED,		/* prevents concurrent reclaim */
 };
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 0fe8b71..3f08ba3 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -917,6 +917,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 	unsigned long nr_reclaimed = 0;
 	unsigned long nr_writeback = 0;
 	unsigned long nr_immediate = 0;
+	int file;
 
 	cond_resched();
 
@@ -954,6 +955,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		may_enter_fs = (sc->gfp_mask & __GFP_FS) ||
 			(PageSwapCache(page) && (sc->gfp_mask & __GFP_IO));
 
+		file = page_is_file_cache(page)
+
 		/*
 		 * The number of dirty pages determines if a zone is marked
 		 * reclaim_congested which affects wait_iff_congested. kswapd
@@ -1016,7 +1019,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			/* Case 1 above */
 			if (current_is_kswapd() &&
 			    PageReclaim(page) &&
-			    test_bit(PGDAT_WRITEBACK, &pgdat->flags)) {
+			    test_bit(PGDAT_ANON_WRITEBACK + file, &pgdat->flags)) {
 				nr_immediate++;
 				goto keep_locked;
 
@@ -1643,13 +1646,14 @@ putback_inactive_pages(struct lruvec *lruvec, struct list_head *page_list)
  * If a kernel thread (such as nfsd for loop-back mounts) services
  * a backing device by writing to the page cache it sets PF_LESS_THROTTLE.
  * In that case we should only throttle if the backing device it is
- * writing to is congested.  In other cases it is safe to throttle.
+ * writing to is congested. The bdi flusher should be throttled only depends
+ * on own bdi and is decoupled with others. In other cases it is safe to throttle.
  */
-static int current_may_throttle(void)
+static int current_may_throttle(int file)
 {
 	return !(current->flags & PF_LESS_THROTTLE) ||
-		current->backing_dev_info == NULL ||
-		bdi_write_congested(current->backing_dev_info);
+		(file && (current->backing_dev_info == NULL ||
+		bdi_write_congested(current->backing_dev_info)));
 }
 
 static bool inactive_reclaimable_pages(struct lruvec *lruvec,
@@ -1774,7 +1778,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	 * are encountered in the nr_immediate check below.
 	 */
 	if (nr_writeback && nr_writeback == nr_taken)
-		set_bit(PGDAT_WRITEBACK, &pgdat->flags);
+		set_bit(PGDAT_ANON_WRITEBACK + file, &pgdat->flags);
 
 	/*
 	 * Legacy memcg will stall in page writeback so avoid forcibly
@@ -1803,7 +1807,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 		 * that pages are cycling through the LRU faster than
 		 * they are written so also forcibly stall.
 		 */
-		if (nr_immediate && current_may_throttle())
+		if (nr_immediate && current_may_throttle(file))
 			congestion_wait(BLK_RW_ASYNC, HZ/10);
 	}
 
@@ -1813,7 +1817,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	 * unqueued dirty pages or cycling through the LRU too quickly.
 	 */
 	if (!sc->hibernation_mode && !current_is_kswapd() &&
-	    current_may_throttle())
+	    current_may_throttle(file))
 		wait_iff_congested(pgdat, BLK_RW_ASYNC, HZ/10);
 
 	trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id,
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] bdi flusher should not be throttled here when it fall into buddy slow path
  2016-10-20 12:38 ` zhouxianrong
@ 2016-10-20 13:05   ` Mika Penttilä
  2016-10-20 13:28   ` Michal Hocko
  1 sibling, 0 replies; 8+ messages in thread
From: Mika Penttilä @ 2016-10-20 13:05 UTC (permalink / raw)
  To: zhouxianrong, linux-mm
  Cc: linux-kernel, akpm, viro, mingo, peterz, hannes, mgorman, vbabka,
	mhocko, vdavydov.dev, minchan, riel, zhouxiyu, zhangshiming5,
	won.ho.park, tuxiaobing



On 20.10.2016 15:38, zhouxianrong@huawei.com wrote:
> From: z00281421 <z00281421@notesmail.huawei.com>
>
> The bdi flusher should be throttled only depends on 
> own bdi and is decoupled with others.
>
> separate PGDAT_WRITEBACK into PGDAT_ANON_WRITEBACK and
> PGDAT_FILE_WRITEBACK avoid scanning anon lru and it is ok 
> then throttled on file WRITEBACK.
>
> i think above may be not right.
>
> Signed-off-by: z00281421 <z00281421@notesmail.huawei.com>
> ---
>  fs/fs-writeback.c      |    8 ++++++--
>  include/linux/mmzone.h |    7 +++++--
>  mm/vmscan.c            |   20 ++++++++++++--------
>  3 files changed, 23 insertions(+), 12 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 05713a5..ddcc70f 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1905,10 +1905,13 @@ void wb_workfn(struct work_struct *work)
>  {
>  	struct bdi_writeback *wb = container_of(to_delayed_work(work),
>  						struct bdi_writeback, dwork);
> +	struct backing_dev_info *bdi = container_of(to_delayed_work(work),
> +						struct backing_dev_info, wb.dwork);
>  	long pages_written;
>  
>  	set_worker_desc("flush-%s", dev_name(wb->bdi->dev));
> -	current->flags |= PF_SWAPWRITE;
> +	current->flags |= (PF_SWAPWRITE | PF_LESS_THROTTLE);
> +	current->bdi = bdi;
>  
>  	if (likely(!current_is_workqueue_rescuer() ||
>  		   !test_bit(WB_registered, &wb->state))) {
> @@ -1938,7 +1941,8 @@ void wb_workfn(struct work_struct *work)
>  	else if (wb_has_dirty_io(wb) && dirty_writeback_interval)
>  		wb_wakeup_delayed(wb);
>  
> -	current->flags &= ~PF_SWAPWRITE;
> +	current->bdi = NULL;
> +	current->flags &= ~(PF_SWAPWRITE | PF_LESS_THROTTLE);
>  }
>  
>  /*
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 7f2ae99..fa602e9 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -528,8 +528,11 @@ enum pgdat_flags {
>  					 * many dirty file pages at the tail
>  					 * of the LRU.
>  					 */
> -	PGDAT_WRITEBACK,		/* reclaim scanning has recently found
> -					 * many pages under writeback
> +	PGDAT_ANON_WRITEBACK,		/* reclaim scanning has recently found
> +					 * many anonymous pages under writeback
> +					 */
> +	PGDAT_FILE_WRITEBACK,		/* reclaim scanning has recently found
> +					 * many file pages under writeback
>  					 */
>  	PGDAT_RECLAIM_LOCKED,		/* prevents concurrent reclaim */

Nobody seems to be clearing those bits (same was with PGDAT_WRITEBACK) ?


--Mika

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] bdi flusher should not be throttled here when it fall into buddy slow path
  2016-10-20 12:38 ` zhouxianrong
  2016-10-20 13:05   ` Mika Penttilä
@ 2016-10-20 13:28   ` Michal Hocko
  1 sibling, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2016-10-20 13:28 UTC (permalink / raw)
  To: zhouxianrong
  Cc: linux-mm, linux-kernel, akpm, viro, mingo, peterz, hannes,
	mgorman, vbabka, vdavydov.dev, minchan, riel, zhouxiyu,
	zhangshiming5, won.ho.park, tuxiaobing

On Thu 20-10-16 20:38:05, zhouxianrong@huawei.com wrote:
> From: z00281421 <z00281421@notesmail.huawei.com>
> 
> The bdi flusher should be throttled only depends on 
> own bdi and is decoupled with others.
> 
> separate PGDAT_WRITEBACK into PGDAT_ANON_WRITEBACK and
> PGDAT_FILE_WRITEBACK avoid scanning anon lru and it is ok 
> then throttled on file WRITEBACK.

Could you please answer questions from
http://lkml.kernel.org/r/20161018114207.GD12092@dhcp22.suse.cz before
coming up with new and even more complex patches please?

I would really like to understand the issue you are seeing before
jumping into patches...

Thanks!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-10-20 13:28 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-18  7:12 [PATCH] bdi flusher should not be throttled here when it fall into buddy slow path zhouxianrong
2016-10-18  9:34 ` Hillf Danton
2016-10-18  9:59 ` Mel Gorman
2016-10-18 11:08   ` zhouxianrong
2016-10-18 11:42     ` Michal Hocko
2016-10-20 12:38 ` zhouxianrong
2016-10-20 13:05   ` Mika Penttilä
2016-10-20 13:28   ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).