linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC PATCH 1/2] mm, vmstat: hide /proc/pagetypeinfo from normal users
       [not found]   ` <20191023102737.32274-2-mhocko@kernel.org>
@ 2019-10-23 16:15     ` Vlastimil Babka
  0 siblings, 0 replies; 9+ messages in thread
From: Vlastimil Babka @ 2019-10-23 16:15 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton, Mel Gorman, Waiman Long
  Cc: Johannes Weiner, Roman Gushchin, Konstantin Khlebnikov, Jann Horn,
	Song Liu, Greg Kroah-Hartman, Rafael Aquini, linux-mm, LKML,
	Michal Hocko, Linux API

+ linux-api

On 10/23/19 12:27 PM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> /proc/pagetypeinfo is a debugging tool to examine internal page
> allocator state wrt to fragmentation. It is not very useful for
> any other use so normal users really do not need to read this file.
> 
> Waiman Long has noticed that reading this file can have negative side
> effects because zone->lock is necessary for gathering data and that
> a) interferes with the page allocator and its users and b) can lead to
> hard lockups on large machines which have very long free_list.
> 
> Reduce both issues by simply not exporting the file to regular users.
> 
> Reported-by: Waiman Long <longman@redhat.com>
> Cc: stable
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/vmstat.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 6afc892a148a..4e885ecd44d1 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1972,7 +1972,7 @@ void __init init_mm_internals(void)
>  #endif
>  #ifdef CONFIG_PROC_FS
>  	proc_create_seq("buddyinfo", 0444, NULL, &fragmentation_op);
> -	proc_create_seq("pagetypeinfo", 0444, NULL, &pagetypeinfo_op);
> +	proc_create_seq("pagetypeinfo", 0400, NULL, &pagetypeinfo_op);
>  	proc_create_seq("vmstat", 0444, NULL, &vmstat_op);
>  	proc_create_seq("zoneinfo", 0444, NULL, &zoneinfo_op);
>  #endif
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 2/2] mm, vmstat: reduce zone->lock holding time by /proc/pagetypeinfo
       [not found]   ` <20191023102737.32274-3-mhocko@kernel.org>
@ 2019-10-23 16:15     ` Vlastimil Babka
  2019-10-23 17:34     ` [PATCH 1/2] mm, vmstat: Release zone lock more frequently when reading /proc/pagetypeinfo Waiman Long
  2019-10-23 17:34     ` [PATCH 2/2] mm, vmstat: List total free blocks for each order in /proc/pagetypeinfo Waiman Long
  2 siblings, 0 replies; 9+ messages in thread
From: Vlastimil Babka @ 2019-10-23 16:15 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton, Mel Gorman, Waiman Long
  Cc: Johannes Weiner, Roman Gushchin, Konstantin Khlebnikov, Jann Horn,
	Song Liu, Greg Kroah-Hartman, Rafael Aquini, linux-mm, LKML,
	Michal Hocko, Linux API

+ linux-api

On 10/23/19 12:27 PM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> pagetypeinfo_showfree_print is called by zone->lock held in irq mode.
> This is not really nice because it blocks both any interrupts on that
> cpu and the page allocator. On large machines this might even trigger
> the hard lockup detector.
> 
> Considering the pagetypeinfo is a debugging tool we do not really need
> exact numbers here. The primary reason to look at the outuput is to see
> how pageblocks are spread among different migratetypes therefore putting
> a bound on the number of pages on the free_list sounds like a reasonable
> tradeoff.
> 
> The new output will simply tell
> [...]
> Node    6, zone   Normal, type      Movable >100000 >100000 >100000 >100000  41019  31560  23996  10054   3229    983    648
> 
> instead of
> Node    6, zone   Normal, type      Movable 399568 294127 221558 102119  41019  31560  23996  10054   3229    983    648
> 
> The limit has been chosen arbitrary and it is a subject of a future
> change should there be a need for that.
> 
> Suggested-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/vmstat.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 4e885ecd44d1..762034fc3b83 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1386,8 +1386,25 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
>  
>  			area = &(zone->free_area[order]);
>  
> -			list_for_each(curr, &area->free_list[mtype])
> +			list_for_each(curr, &area->free_list[mtype]) {
>  				freecount++;
> +				/*
> +				 * Cap the free_list iteration because it might
> +				 * be really large and we are under a spinlock
> +				 * so a long time spent here could trigger a
> +				 * hard lockup detector. Anyway this is a
> +				 * debugging tool so knowing there is a handful
> +				 * of pages in this order should be more than
> +				 * sufficient
> +				 */
> +				if (freecount > 100000) {
> +					seq_printf(m, ">%6lu ", freecount);
> +					spin_unlock_irq(&zone->lock);
> +					cond_resched();
> +					spin_lock_irq(&zone->lock);
> +					continue;
> +				}
> +			}
>  			seq_printf(m, "%6lu ", freecount);
>  		}
>  		seq_putc(m, '\n');
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] mm, vmstat: Release zone lock more frequently when reading /proc/pagetypeinfo
       [not found]   ` <20191023102737.32274-3-mhocko@kernel.org>
  2019-10-23 16:15     ` [RFC PATCH 2/2] mm, vmstat: reduce zone->lock holding time by /proc/pagetypeinfo Vlastimil Babka
@ 2019-10-23 17:34     ` Waiman Long
  2019-10-23 18:01       ` Michal Hocko
  2019-10-23 17:34     ` [PATCH 2/2] mm, vmstat: List total free blocks for each order in /proc/pagetypeinfo Waiman Long
  2 siblings, 1 reply; 9+ messages in thread
From: Waiman Long @ 2019-10-23 17:34 UTC (permalink / raw)
  To: Andrew Morton, Michal Hocko, Mel Gorman
  Cc: linux-mm, linux-kernel, linux-api, Johannes Weiner,
	Roman Gushchin, Vlastimil Babka, Konstantin Khlebnikov, Jann Horn,
	Song Liu, Greg Kroah-Hartman, Rafael Aquini, Waiman Long

With a threshold of 100000, it is still possible that the zone lock
will be held for a very long time in the worst case scenario where all
the counts are just below the threshold. With up to 6 migration types
and 11 orders, it means up to 6.6 millions.

Track the total number of list iterations done since the acquisition
of the zone lock and release it whenever 100000 iterations or more have
been completed. This will cap the lock hold time to no more than 200,000
list iterations.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 mm/vmstat.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 57ba091e5460..c5b82fdf54af 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1373,6 +1373,7 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
 					pg_data_t *pgdat, struct zone *zone)
 {
 	int order, mtype;
+	unsigned long iteration_count = 0;
 
 	for (mtype = 0; mtype < MIGRATE_TYPES; mtype++) {
 		seq_printf(m, "Node %4d, zone %8s, type %12s ",
@@ -1397,15 +1398,24 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
 				 * of pages in this order should be more than
 				 * sufficient
 				 */
-				if (++freecount >= 100000) {
+				if (++freecount > 100000) {
 					overflow = true;
-					spin_unlock_irq(&zone->lock);
-					cond_resched();
-					spin_lock_irq(&zone->lock);
+					freecount--;
 					break;
 				}
 			}
 			seq_printf(m, "%s%6lu ", overflow ? ">" : "", freecount);
+			/*
+			 * Take a break and release the zone lock when
+			 * 100000 or more entries have been iterated.
+			 */
+			iteration_count += freecount;
+			if (iteration_count >= 100000) {
+				iteration_count = 0;
+				spin_unlock_irq(&zone->lock);
+				cond_resched();
+				spin_lock_irq(&zone->lock);
+			}
 		}
 		seq_putc(m, '\n');
 	}
-- 
2.18.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] mm, vmstat: List total free blocks for each order in /proc/pagetypeinfo
       [not found]   ` <20191023102737.32274-3-mhocko@kernel.org>
  2019-10-23 16:15     ` [RFC PATCH 2/2] mm, vmstat: reduce zone->lock holding time by /proc/pagetypeinfo Vlastimil Babka
  2019-10-23 17:34     ` [PATCH 1/2] mm, vmstat: Release zone lock more frequently when reading /proc/pagetypeinfo Waiman Long
@ 2019-10-23 17:34     ` Waiman Long
  2019-10-23 18:02       ` Michal Hocko
  2 siblings, 1 reply; 9+ messages in thread
From: Waiman Long @ 2019-10-23 17:34 UTC (permalink / raw)
  To: Andrew Morton, Michal Hocko, Mel Gorman
  Cc: linux-mm, linux-kernel, linux-api, Johannes Weiner,
	Roman Gushchin, Vlastimil Babka, Konstantin Khlebnikov, Jann Horn,
	Song Liu, Greg Kroah-Hartman, Rafael Aquini, Waiman Long

Now that the free block count for each migration types in
/proc/pagetypeinfo may not show the exact count if it excceeds
100,000. Users may not know how much more the counts will be. As the
free_area structure has already tracked the total free block count in
nr_free, we may as well print it out with no additional cost. That will
give users a rough idea of where the upper bounds will be.

If there is no overflow, the presence of the total counts will also
enable us to check if the nr_free counts match the total number of
entries in the free lists.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 mm/vmstat.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index c5b82fdf54af..172946d8f358 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1373,6 +1373,7 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
 					pg_data_t *pgdat, struct zone *zone)
 {
 	int order, mtype;
+	struct free_area *area;
 	unsigned long iteration_count = 0;
 
 	for (mtype = 0; mtype < MIGRATE_TYPES; mtype++) {
@@ -1382,7 +1383,6 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
 					migratetype_names[mtype]);
 		for (order = 0; order < MAX_ORDER; ++order) {
 			unsigned long freecount = 0;
-			struct free_area *area;
 			struct list_head *curr;
 			bool overflow = false;
 
@@ -1419,6 +1419,17 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
 		}
 		seq_putc(m, '\n');
 	}
+
+	/*
+	 * List total free blocks per order
+	 */
+	seq_printf(m, "Node %4d, zone %8s, total             ",
+		   pgdat->node_id, zone->name);
+	for (order = 0; order < MAX_ORDER; ++order) {
+		area = &(zone->free_area[order]);
+		seq_printf(m, "%6lu ", area->nr_free);
+	}
+	seq_putc(m, '\n');
 }
 
 /* Print out the free pages at each order for each migatetype */
-- 
2.18.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] mm, vmstat: Release zone lock more frequently when reading /proc/pagetypeinfo
  2019-10-23 17:34     ` [PATCH 1/2] mm, vmstat: Release zone lock more frequently when reading /proc/pagetypeinfo Waiman Long
@ 2019-10-23 18:01       ` Michal Hocko
  2019-10-23 18:14         ` Waiman Long
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2019-10-23 18:01 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andrew Morton, Mel Gorman, linux-mm, linux-kernel, linux-api,
	Johannes Weiner, Roman Gushchin, Vlastimil Babka,
	Konstantin Khlebnikov, Jann Horn, Song Liu, Greg Kroah-Hartman,
	Rafael Aquini

On Wed 23-10-19 13:34:22, Waiman Long wrote:
> With a threshold of 100000, it is still possible that the zone lock
> will be held for a very long time in the worst case scenario where all
> the counts are just below the threshold. With up to 6 migration types
> and 11 orders, it means up to 6.6 millions.
> 
> Track the total number of list iterations done since the acquisition
> of the zone lock and release it whenever 100000 iterations or more have
> been completed. This will cap the lock hold time to no more than 200,000
> list iterations.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  mm/vmstat.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 57ba091e5460..c5b82fdf54af 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1373,6 +1373,7 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
>  					pg_data_t *pgdat, struct zone *zone)
>  {
>  	int order, mtype;
> +	unsigned long iteration_count = 0;
>  
>  	for (mtype = 0; mtype < MIGRATE_TYPES; mtype++) {
>  		seq_printf(m, "Node %4d, zone %8s, type %12s ",
> @@ -1397,15 +1398,24 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
>  				 * of pages in this order should be more than
>  				 * sufficient
>  				 */
> -				if (++freecount >= 100000) {
> +				if (++freecount > 100000) {
>  					overflow = true;
> -					spin_unlock_irq(&zone->lock);
> -					cond_resched();
> -					spin_lock_irq(&zone->lock);
> +					freecount--;
>  					break;
>  				}
>  			}
>  			seq_printf(m, "%s%6lu ", overflow ? ">" : "", freecount);
> +			/*
> +			 * Take a break and release the zone lock when
> +			 * 100000 or more entries have been iterated.
> +			 */
> +			iteration_count += freecount;
> +			if (iteration_count >= 100000) {
> +				iteration_count = 0;
> +				spin_unlock_irq(&zone->lock);
> +				cond_resched();
> +				spin_lock_irq(&zone->lock);
> +			}

Aren't you overengineering this a bit? If you are still worried then we
can simply cond_resched for each order
diff --git a/mm/vmstat.c b/mm/vmstat.c
index c156ce24a322..ddb89f4e0486 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1399,13 +1399,13 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
 				 */
 				if (++freecount >= 100000) {
 					overflow = true;
-					spin_unlock_irq(&zone->lock);
-					cond_resched();
-					spin_lock_irq(&zone->lock);
 					break;
 				}
 			}
 			seq_printf(m, "%s%6lu ", overflow ? ">" : "", freecount);
+			spin_unlock_irq(&zone->lock);
+			cond_resched();
+			spin_lock_irq(&zone->lock);
 		}
 		seq_putc(m, '\n');
 	}

I do not have a strong opinion here but I can fold this into my patch 2.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] mm, vmstat: List total free blocks for each order in /proc/pagetypeinfo
  2019-10-23 17:34     ` [PATCH 2/2] mm, vmstat: List total free blocks for each order in /proc/pagetypeinfo Waiman Long
@ 2019-10-23 18:02       ` Michal Hocko
  2019-10-23 18:07         ` Waiman Long
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2019-10-23 18:02 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andrew Morton, Mel Gorman, linux-mm, linux-kernel, linux-api,
	Johannes Weiner, Roman Gushchin, Vlastimil Babka,
	Konstantin Khlebnikov, Jann Horn, Song Liu, Greg Kroah-Hartman,
	Rafael Aquini

On Wed 23-10-19 13:34:23, Waiman Long wrote:
[...]
> @@ -1419,6 +1419,17 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
>  		}
>  		seq_putc(m, '\n');
>  	}
> +
> +	/*
> +	 * List total free blocks per order
> +	 */
> +	seq_printf(m, "Node %4d, zone %8s, total             ",
> +		   pgdat->node_id, zone->name);
> +	for (order = 0; order < MAX_ORDER; ++order) {
> +		area = &(zone->free_area[order]);
> +		seq_printf(m, "%6lu ", area->nr_free);
> +	}
> +	seq_putc(m, '\n');

This is essentially duplicating /proc/buddyinfo. Do we really need that?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] mm, vmstat: List total free blocks for each order in /proc/pagetypeinfo
  2019-10-23 18:02       ` Michal Hocko
@ 2019-10-23 18:07         ` Waiman Long
  0 siblings, 0 replies; 9+ messages in thread
From: Waiman Long @ 2019-10-23 18:07 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Mel Gorman, linux-mm, linux-kernel, linux-api,
	Johannes Weiner, Roman Gushchin, Vlastimil Babka,
	Konstantin Khlebnikov, Jann Horn, Song Liu, Greg Kroah-Hartman,
	Rafael Aquini

On 10/23/19 2:02 PM, Michal Hocko wrote:
> On Wed 23-10-19 13:34:23, Waiman Long wrote:
> [...]
>> @@ -1419,6 +1419,17 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
>>  		}
>>  		seq_putc(m, '\n');
>>  	}
>> +
>> +	/*
>> +	 * List total free blocks per order
>> +	 */
>> +	seq_printf(m, "Node %4d, zone %8s, total             ",
>> +		   pgdat->node_id, zone->name);
>> +	for (order = 0; order < MAX_ORDER; ++order) {
>> +		area = &(zone->free_area[order]);
>> +		seq_printf(m, "%6lu ", area->nr_free);
>> +	}
>> +	seq_putc(m, '\n');
> This is essentially duplicating /proc/buddyinfo. Do we really need that?

Yes, you are right. As the information is available elsewhere. I am fine
with dropping this.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] mm, vmstat: Release zone lock more frequently when reading /proc/pagetypeinfo
  2019-10-23 18:01       ` Michal Hocko
@ 2019-10-23 18:14         ` Waiman Long
  2019-10-23 20:02           ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Waiman Long @ 2019-10-23 18:14 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Mel Gorman, linux-mm, linux-kernel, linux-api,
	Johannes Weiner, Roman Gushchin, Vlastimil Babka,
	Konstantin Khlebnikov, Jann Horn, Song Liu, Greg Kroah-Hartman,
	Rafael Aquini

On 10/23/19 2:01 PM, Michal Hocko wrote:
> On Wed 23-10-19 13:34:22, Waiman Long wrote:
>> With a threshold of 100000, it is still possible that the zone lock
>> will be held for a very long time in the worst case scenario where all
>> the counts are just below the threshold. With up to 6 migration types
>> and 11 orders, it means up to 6.6 millions.
>>
>> Track the total number of list iterations done since the acquisition
>> of the zone lock and release it whenever 100000 iterations or more have
>> been completed. This will cap the lock hold time to no more than 200,000
>> list iterations.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>  mm/vmstat.c | 18 ++++++++++++++----
>>  1 file changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/vmstat.c b/mm/vmstat.c
>> index 57ba091e5460..c5b82fdf54af 100644
>> --- a/mm/vmstat.c
>> +++ b/mm/vmstat.c
>> @@ -1373,6 +1373,7 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
>>  					pg_data_t *pgdat, struct zone *zone)
>>  {
>>  	int order, mtype;
>> +	unsigned long iteration_count = 0;
>>  
>>  	for (mtype = 0; mtype < MIGRATE_TYPES; mtype++) {
>>  		seq_printf(m, "Node %4d, zone %8s, type %12s ",
>> @@ -1397,15 +1398,24 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
>>  				 * of pages in this order should be more than
>>  				 * sufficient
>>  				 */
>> -				if (++freecount >= 100000) {
>> +				if (++freecount > 100000) {
>>  					overflow = true;
>> -					spin_unlock_irq(&zone->lock);
>> -					cond_resched();
>> -					spin_lock_irq(&zone->lock);
>> +					freecount--;
>>  					break;
>>  				}
>>  			}
>>  			seq_printf(m, "%s%6lu ", overflow ? ">" : "", freecount);
>> +			/*
>> +			 * Take a break and release the zone lock when
>> +			 * 100000 or more entries have been iterated.
>> +			 */
>> +			iteration_count += freecount;
>> +			if (iteration_count >= 100000) {
>> +				iteration_count = 0;
>> +				spin_unlock_irq(&zone->lock);
>> +				cond_resched();
>> +				spin_lock_irq(&zone->lock);
>> +			}
> Aren't you overengineering this a bit? If you are still worried then we
> can simply cond_resched for each order
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index c156ce24a322..ddb89f4e0486 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1399,13 +1399,13 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
>  				 */
>  				if (++freecount >= 100000) {
>  					overflow = true;
> -					spin_unlock_irq(&zone->lock);
> -					cond_resched();
> -					spin_lock_irq(&zone->lock);
>  					break;
>  				}
>  			}
>  			seq_printf(m, "%s%6lu ", overflow ? ">" : "", freecount);
> +			spin_unlock_irq(&zone->lock);
> +			cond_resched();
> +			spin_lock_irq(&zone->lock);
>  		}
>  		seq_putc(m, '\n');
>  	}
>
> I do not have a strong opinion here but I can fold this into my patch 2.

If the free list is empty or is very short, there is probably no need to
release and reacquire the lock. How about adding a check for a lower
bound like:

if (freecount > 1000) {
    spin_unlock_irq(&zone->lock);
    cond_resched();
    spin_lock_irq(&zone->lock);
}

Cheers,
Longman

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] mm, vmstat: Release zone lock more frequently when reading /proc/pagetypeinfo
  2019-10-23 18:14         ` Waiman Long
@ 2019-10-23 20:02           ` Michal Hocko
  0 siblings, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2019-10-23 20:02 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andrew Morton, Mel Gorman, linux-mm, linux-kernel, linux-api,
	Johannes Weiner, Roman Gushchin, Vlastimil Babka,
	Konstantin Khlebnikov, Jann Horn, Song Liu, Greg Kroah-Hartman,
	Rafael Aquini

On Wed 23-10-19 14:14:14, Waiman Long wrote:
> On 10/23/19 2:01 PM, Michal Hocko wrote:
> > On Wed 23-10-19 13:34:22, Waiman Long wrote:
> >> With a threshold of 100000, it is still possible that the zone lock
> >> will be held for a very long time in the worst case scenario where all
> >> the counts are just below the threshold. With up to 6 migration types
> >> and 11 orders, it means up to 6.6 millions.
> >>
> >> Track the total number of list iterations done since the acquisition
> >> of the zone lock and release it whenever 100000 iterations or more have
> >> been completed. This will cap the lock hold time to no more than 200,000
> >> list iterations.
> >>
> >> Signed-off-by: Waiman Long <longman@redhat.com>
> >> ---
> >>  mm/vmstat.c | 18 ++++++++++++++----
> >>  1 file changed, 14 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/mm/vmstat.c b/mm/vmstat.c
> >> index 57ba091e5460..c5b82fdf54af 100644
> >> --- a/mm/vmstat.c
> >> +++ b/mm/vmstat.c
> >> @@ -1373,6 +1373,7 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
> >>  					pg_data_t *pgdat, struct zone *zone)
> >>  {
> >>  	int order, mtype;
> >> +	unsigned long iteration_count = 0;
> >>  
> >>  	for (mtype = 0; mtype < MIGRATE_TYPES; mtype++) {
> >>  		seq_printf(m, "Node %4d, zone %8s, type %12s ",
> >> @@ -1397,15 +1398,24 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
> >>  				 * of pages in this order should be more than
> >>  				 * sufficient
> >>  				 */
> >> -				if (++freecount >= 100000) {
> >> +				if (++freecount > 100000) {
> >>  					overflow = true;
> >> -					spin_unlock_irq(&zone->lock);
> >> -					cond_resched();
> >> -					spin_lock_irq(&zone->lock);
> >> +					freecount--;
> >>  					break;
> >>  				}
> >>  			}
> >>  			seq_printf(m, "%s%6lu ", overflow ? ">" : "", freecount);
> >> +			/*
> >> +			 * Take a break and release the zone lock when
> >> +			 * 100000 or more entries have been iterated.
> >> +			 */
> >> +			iteration_count += freecount;
> >> +			if (iteration_count >= 100000) {
> >> +				iteration_count = 0;
> >> +				spin_unlock_irq(&zone->lock);
> >> +				cond_resched();
> >> +				spin_lock_irq(&zone->lock);
> >> +			}
> > Aren't you overengineering this a bit? If you are still worried then we
> > can simply cond_resched for each order
> > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > index c156ce24a322..ddb89f4e0486 100644
> > --- a/mm/vmstat.c
> > +++ b/mm/vmstat.c
> > @@ -1399,13 +1399,13 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
> >  				 */
> >  				if (++freecount >= 100000) {
> >  					overflow = true;
> > -					spin_unlock_irq(&zone->lock);
> > -					cond_resched();
> > -					spin_lock_irq(&zone->lock);
> >  					break;
> >  				}
> >  			}
> >  			seq_printf(m, "%s%6lu ", overflow ? ">" : "", freecount);
> > +			spin_unlock_irq(&zone->lock);
> > +			cond_resched();
> > +			spin_lock_irq(&zone->lock);
> >  		}
> >  		seq_putc(m, '\n');
> >  	}
> >
> > I do not have a strong opinion here but I can fold this into my patch 2.
> 
> If the free list is empty or is very short, there is probably no need to
> release and reacquire the lock. How about adding a check for a lower
> bound like:

Again, does it really make any sense to micro optimize something like
this. It is a debugging tool. I would rather go simple.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-10-23 20:02 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20191023095607.GE3016@techsingularity.net>
     [not found] ` <20191023102737.32274-1-mhocko@kernel.org>
     [not found]   ` <20191023102737.32274-2-mhocko@kernel.org>
2019-10-23 16:15     ` [RFC PATCH 1/2] mm, vmstat: hide /proc/pagetypeinfo from normal users Vlastimil Babka
     [not found]   ` <20191023102737.32274-3-mhocko@kernel.org>
2019-10-23 16:15     ` [RFC PATCH 2/2] mm, vmstat: reduce zone->lock holding time by /proc/pagetypeinfo Vlastimil Babka
2019-10-23 17:34     ` [PATCH 1/2] mm, vmstat: Release zone lock more frequently when reading /proc/pagetypeinfo Waiman Long
2019-10-23 18:01       ` Michal Hocko
2019-10-23 18:14         ` Waiman Long
2019-10-23 20:02           ` Michal Hocko
2019-10-23 17:34     ` [PATCH 2/2] mm, vmstat: List total free blocks for each order in /proc/pagetypeinfo Waiman Long
2019-10-23 18:02       ` Michal Hocko
2019-10-23 18:07         ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).