* [patch 1/3] vmstat: Make pageset processing optional in refresh_cpu_vm_stats
2015-10-28 2:41 [patch 0/3] vmstat: Various enhancements Christoph Lameter
@ 2015-10-28 2:41 ` Christoph Lameter
2015-10-28 2:41 ` [patch 2/3] vmstat: make vmstat_updater deferrable again and shut down on idle Christoph Lameter
2015-10-28 2:41 ` [patch 3/3] vmstat: Create our own workqueue Christoph Lameter
2 siblings, 0 replies; 19+ messages in thread
From: Christoph Lameter @ 2015-10-28 2:41 UTC (permalink / raw)
To: akpm
Cc: Michal Hocko, Tejun Heo, Tetsuo Handa, linux-mm, linux-kernel,
torvalds, hannes, mgorman
[-- Attachment #1: vmstat_do_pageset_parameter --]
[-- Type: text/plain, Size: 2579 bytes --]
Add a parameter to refresh_cpu_vm_stats() to make pageset expiration
optional. Flushing the pagesets is performed by the page allocator
and thus processing of pagesets may not be wanted when just intending
to fold the differentials.
Signed-of-by: Christoph Lameter <cl@linux.com>
Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -460,7 +460,7 @@ static int fold_diff(int *diff)
*
* The function returns the number of global counters updated.
*/
-static int refresh_cpu_vm_stats(void)
+static int refresh_cpu_vm_stats(bool do_pagesets)
{
struct zone *zone;
int i;
@@ -484,33 +484,35 @@ static int refresh_cpu_vm_stats(void)
#endif
}
}
- cond_resched();
#ifdef CONFIG_NUMA
- /*
- * Deal with draining the remote pageset of this
- * processor
- *
- * Check if there are pages remaining in this pageset
- * if not then there is nothing to expire.
- */
- if (!__this_cpu_read(p->expire) ||
+ if (do_pagesets) {
+ cond_resched();
+ /*
+ * Deal with draining the remote pageset of this
+ * processor
+ *
+ * Check if there are pages remaining in this pageset
+ * if not then there is nothing to expire.
+ */
+ if (!__this_cpu_read(p->expire) ||
!__this_cpu_read(p->pcp.count))
- continue;
+ continue;
- /*
- * We never drain zones local to this processor.
- */
- if (zone_to_nid(zone) == numa_node_id()) {
- __this_cpu_write(p->expire, 0);
- continue;
- }
+ /*
+ * We never drain zones local to this processor.
+ */
+ if (zone_to_nid(zone) == numa_node_id()) {
+ __this_cpu_write(p->expire, 0);
+ continue;
+ }
- if (__this_cpu_dec_return(p->expire))
- continue;
+ if (__this_cpu_dec_return(p->expire))
+ continue;
- if (__this_cpu_read(p->pcp.count)) {
- drain_zone_pages(zone, this_cpu_ptr(&p->pcp));
- changes++;
+ if (__this_cpu_read(p->pcp.count)) {
+ drain_zone_pages(zone, this_cpu_ptr(&p->pcp));
+ changes++;
+ }
}
#endif
}
@@ -1363,7 +1365,7 @@ static cpumask_var_t cpu_stat_off;
static void vmstat_update(struct work_struct *w)
{
- if (refresh_cpu_vm_stats()) {
+ if (refresh_cpu_vm_stats(true)) {
/*
* Counters were updated so we expect more updates
* to occur in the future. Keep on running the
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [patch 2/3] vmstat: make vmstat_updater deferrable again and shut down on idle
2015-10-28 2:41 [patch 0/3] vmstat: Various enhancements Christoph Lameter
2015-10-28 2:41 ` [patch 1/3] vmstat: Make pageset processing optional in refresh_cpu_vm_stats Christoph Lameter
@ 2015-10-28 2:41 ` Christoph Lameter
2015-10-28 2:41 ` [patch 3/3] vmstat: Create our own workqueue Christoph Lameter
2 siblings, 0 replies; 19+ messages in thread
From: Christoph Lameter @ 2015-10-28 2:41 UTC (permalink / raw)
To: akpm
Cc: Michal Hocko, Tejun Heo, Tetsuo Handa, linux-mm, linux-kernel,
torvalds, hannes, mgorman
[-- Attachment #1: vmstat_quiet_on_idle --]
[-- Type: text/plain, Size: 3248 bytes --]
Currently the vmstat updater is not deferrable as a result of commit
ba4877b9ca51f80b5d30f304a46762f0509e1635. This in turn can cause multiple
interruptions of the applications because the vmstat updater may run at
different times than tick processing. No good.
Make vmstate_update deferrable again and provide a function that
shuts down the vmstat updater when we go idle by folding the differentials.
Shut it down from the load average calculation logic introduced by nohz.
Note that the shepherd thread will continue scanning the differentials
from another processor and will reenable the vmstat workers if it
detects any changes.
Fixes: ba4877b9ca51f80b5d30f304a46762f0509e1635 (do not use deferrable delay)
Signed-off-by: Christoph Lameter <cl@linux.com>
Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -1397,6 +1397,20 @@ static void vmstat_update(struct work_st
}
/*
+ * Switch off vmstat processing and then fold all the remaining differentials
+ * until the diffs stay at zero. The function is used by NOHZ and can only be
+ * invoked when tick processing is not active.
+ */
+void quiet_vmstat(void)
+{
+ do {
+ if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
+ cancel_delayed_work(this_cpu_ptr(&vmstat_work));
+
+ } while (refresh_cpu_vm_stats(false));
+}
+
+/*
* Check if the diffs for a certain cpu indicate that
* an update is needed.
*/
@@ -1428,7 +1442,7 @@ static bool need_update(int cpu)
*/
static void vmstat_shepherd(struct work_struct *w);
-static DECLARE_DELAYED_WORK(shepherd, vmstat_shepherd);
+static DECLARE_DEFERRABLE_WORK(shepherd, vmstat_shepherd);
static void vmstat_shepherd(struct work_struct *w)
{
Index: linux/include/linux/vmstat.h
===================================================================
--- linux.orig/include/linux/vmstat.h
+++ linux/include/linux/vmstat.h
@@ -211,6 +211,7 @@ extern void __inc_zone_state(struct zone
extern void dec_zone_state(struct zone *, enum zone_stat_item);
extern void __dec_zone_state(struct zone *, enum zone_stat_item);
+void quiet_vmstat(void);
void cpu_vm_stats_fold(int cpu);
void refresh_zone_stat_thresholds(void);
@@ -272,6 +273,7 @@ static inline void __dec_zone_page_state
static inline void refresh_cpu_vm_stats(int cpu) { }
static inline void refresh_zone_stat_thresholds(void) { }
static inline void cpu_vm_stats_fold(int cpu) { }
+static inline void quiet_vmstat(void) { }
static inline void drain_zonestat(struct zone *zone,
struct per_cpu_pageset *pset) { }
Index: linux/kernel/time/tick-sched.c
===================================================================
--- linux.orig/kernel/time/tick-sched.c
+++ linux/kernel/time/tick-sched.c
@@ -667,6 +667,7 @@ static ktime_t tick_nohz_stop_sched_tick
*/
if (!ts->tick_stopped) {
nohz_balance_enter_idle(cpu);
+ quiet_vmstat();
calc_load_enter_idle();
ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [patch 3/3] vmstat: Create our own workqueue
2015-10-28 2:41 [patch 0/3] vmstat: Various enhancements Christoph Lameter
2015-10-28 2:41 ` [patch 1/3] vmstat: Make pageset processing optional in refresh_cpu_vm_stats Christoph Lameter
2015-10-28 2:41 ` [patch 2/3] vmstat: make vmstat_updater deferrable again and shut down on idle Christoph Lameter
@ 2015-10-28 2:41 ` Christoph Lameter
2015-10-28 2:43 ` Tejun Heo
2 siblings, 1 reply; 19+ messages in thread
From: Christoph Lameter @ 2015-10-28 2:41 UTC (permalink / raw)
To: akpm
Cc: Michal Hocko, Tejun Heo, Tetsuo Handa, linux-mm, linux-kernel,
torvalds, hannes, mgorman
[-- Attachment #1: vmstat_add_workqueue --]
[-- Type: text/plain, Size: 1997 bytes --]
Seems that vmstat needs its own workqueue now since the general
workqueue mechanism has been *enhanced* which means that the
vmstat_updates cannot run reliably but are being blocked by
work requests doing memory allocation. Which causes vmstat
to be unable to keep the counters up to date.
Bad. Fix this by creating our own workqueue.
Signed-off-by: Christoph Lameter <cl@linux.com>
Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -1359,6 +1359,8 @@ static const struct file_operations proc
#endif /* CONFIG_PROC_FS */
#ifdef CONFIG_SMP
+static struct workqueue_struct *vmstat_wq;
+
static DEFINE_PER_CPU(struct delayed_work, vmstat_work);
int sysctl_stat_interval __read_mostly = HZ;
static cpumask_var_t cpu_stat_off;
@@ -1371,7 +1373,7 @@ static void vmstat_update(struct work_st
* to occur in the future. Keep on running the
* update worker thread.
*/
- schedule_delayed_work_on(smp_processor_id(),
+ queue_delayed_work_on(smp_processor_id(), vmstat_wq,
this_cpu_ptr(&vmstat_work),
round_jiffies_relative(sysctl_stat_interval));
} else {
@@ -1454,7 +1456,7 @@ static void vmstat_shepherd(struct work_
if (need_update(cpu) &&
cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
- schedule_delayed_work_on(cpu,
+ queue_delayed_work_on(cpu, vmstat_wq,
&per_cpu(vmstat_work, cpu), 0);
put_online_cpus();
@@ -1543,6 +1545,10 @@ static int __init setup_vmstat(void)
start_shepherd_timer();
cpu_notifier_register_done();
+ vmstat_wq = alloc_workqueue("vmstat",
+ WQ_FREEZABLE|
+ WQ_SYSFS|
+ WQ_MEM_RECLAIM, 0);
#endif
#ifdef CONFIG_PROC_FS
proc_create("buddyinfo", S_IRUGO, NULL, &fragmentation_file_operations);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch 3/3] vmstat: Create our own workqueue
2015-10-28 2:41 ` [patch 3/3] vmstat: Create our own workqueue Christoph Lameter
@ 2015-10-28 2:43 ` Tejun Heo
2015-10-28 3:04 ` Christoph Lameter
0 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2015-10-28 2:43 UTC (permalink / raw)
To: Christoph Lameter
Cc: akpm, Michal Hocko, Tetsuo Handa, linux-mm, linux-kernel,
torvalds, hannes, mgorman
Hello,
On Tue, Oct 27, 2015 at 09:41:17PM -0500, Christoph Lameter wrote:
> + vmstat_wq = alloc_workqueue("vmstat",
> + WQ_FREEZABLE|
> + WQ_SYSFS|
> + WQ_MEM_RECLAIM, 0);
The only thing necessary here is WQ_MEM_RECLAIM. I don't see how
WQ_SYSFS and WQ_FREEZABLE make sense here.
Thanks.
--
tejun
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch 3/3] vmstat: Create our own workqueue
2015-10-28 2:43 ` Tejun Heo
@ 2015-10-28 3:04 ` Christoph Lameter
2015-10-28 11:57 ` Tetsuo Handa
0 siblings, 1 reply; 19+ messages in thread
From: Christoph Lameter @ 2015-10-28 3:04 UTC (permalink / raw)
To: Tejun Heo
Cc: akpm, Michal Hocko, Tetsuo Handa, linux-mm, linux-kernel,
torvalds, hannes, mgorman
On Wed, 28 Oct 2015, Tejun Heo wrote:
> The only thing necessary here is WQ_MEM_RECLAIM. I don't see how
> WQ_SYSFS and WQ_FREEZABLE make sense here.
Subject: vmstat: Remove WQ_FREEZABLE and WQ_SYSFS
Signed-off-by: Christoph Lameter <cl@linux.com>
Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -1546,8 +1546,6 @@ static int __init setup_vmstat(void)
start_shepherd_timer();
cpu_notifier_register_done();
vmstat_wq = alloc_workqueue("vmstat",
- WQ_FREEZABLE|
- WQ_SYSFS|
WQ_MEM_RECLAIM, 0);
#endif
#ifdef CONFIG_PROC_FS
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch 3/3] vmstat: Create our own workqueue
2015-10-28 3:04 ` Christoph Lameter
@ 2015-10-28 11:57 ` Tetsuo Handa
2015-10-28 22:32 ` Christoph Lameter
2015-10-29 2:24 ` Tejun Heo
0 siblings, 2 replies; 19+ messages in thread
From: Tetsuo Handa @ 2015-10-28 11:57 UTC (permalink / raw)
To: cl, htejun
Cc: akpm, mhocko, linux-mm, linux-kernel, torvalds, hannes, mgorman
Christoph Lameter wrote:
> On Wed, 28 Oct 2015, Tejun Heo wrote:
>
> > The only thing necessary here is WQ_MEM_RECLAIM. I don't see how
> > WQ_SYSFS and WQ_FREEZABLE make sense here.
>
I can still trigger silent livelock with this patchset applied.
----------
[ 272.283217] MemAlloc-Info: 9 stalling task, 0 dying task, 0 victim task.
[ 272.285089] MemAlloc: a.out(11325) gfp=0x24280ca order=0 delay=19164
[ 272.286817] MemAlloc: a.out(11326) gfp=0x242014a order=0 delay=19104
[ 272.288512] MemAlloc: vmtoolsd(1897) gfp=0x242014a order=0 delay=19072
[ 272.290280] MemAlloc: kworker/1:3(11286) gfp=0x2400000 order=0 delay=19056
[ 272.292114] MemAlloc: sshd(11202) gfp=0x242014a order=0 delay=18927
[ 272.293908] MemAlloc: tuned(2073) gfp=0x242014a order=0 delay=18799
[ 272.297360] MemAlloc: nmbd(4752) gfp=0x242014a order=0 delay=16532
[ 272.299115] MemAlloc: auditd(529) gfp=0x242014a order=0 delay=13073
[ 272.302248] MemAlloc: irqbalance(1696) gfp=0x242014a order=0 delay=10529
(...snipped...)
[ 272.851035] Showing busy workqueues and worker pools:
[ 272.852583] workqueue events: flags=0x0
[ 272.853942] pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256
[ 272.855781] pending: vmw_fb_dirty_flush [vmwgfx]
[ 272.857500] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[ 272.859359] pending: vmpressure_work_fn
[ 272.860840] workqueue events_freezable_power_: flags=0x84
[ 272.862461] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256
[ 272.864479] in-flight: 11286:disk_events_workfn
[ 272.866065] pending: disk_events_workfn
[ 272.867587] workqueue vmstat: flags=0x8
[ 272.868942] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[ 272.870785] pending: vmstat_update
[ 272.872248] pool 2: cpus=1 node=0 flags=0x0 nice=0 workers=4 idle: 14 218 43
----------
> 2. Create a separate workqueue so that the vmstat updater
> is not blocked by other work requeusts. This creates a
> new kernel thread <sigh> and avoids the issue of
> differentials not folded in a timely fashion.
Did you really mean "the vmstat updater is not blocked by other
work requeusts"?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch 3/3] vmstat: Create our own workqueue
2015-10-28 11:57 ` Tetsuo Handa
@ 2015-10-28 22:32 ` Christoph Lameter
2015-10-29 2:24 ` Tejun Heo
1 sibling, 0 replies; 19+ messages in thread
From: Christoph Lameter @ 2015-10-28 22:32 UTC (permalink / raw)
To: Tetsuo Handa
Cc: htejun, akpm, mhocko, linux-mm, linux-kernel, torvalds, hannes,
mgorman
On Wed, 28 Oct 2015, Tetsuo Handa wrote:
> Christoph Lameter wrote:
> > On Wed, 28 Oct 2015, Tejun Heo wrote:
> >
> > > The only thing necessary here is WQ_MEM_RECLAIM. I don't see how
> > > WQ_SYSFS and WQ_FREEZABLE make sense here.
> >
> I can still trigger silent livelock with this patchset applied.
Ok so why the vmstat updater still deferred, Tejun?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch 3/3] vmstat: Create our own workqueue
2015-10-28 11:57 ` Tetsuo Handa
2015-10-28 22:32 ` Christoph Lameter
@ 2015-10-29 2:24 ` Tejun Heo
2015-10-29 3:08 ` Tejun Heo
1 sibling, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2015-10-29 2:24 UTC (permalink / raw)
To: Tetsuo Handa
Cc: cl, akpm, mhocko, linux-mm, linux-kernel, torvalds, hannes,
mgorman
Hello,
That's weird.
On Wed, Oct 28, 2015 at 08:57:28PM +0900, Tetsuo Handa wrote:
> [ 272.851035] Showing busy workqueues and worker pools:
> [ 272.852583] workqueue events: flags=0x0
> [ 272.853942] pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256
> [ 272.855781] pending: vmw_fb_dirty_flush [vmwgfx]
> [ 272.857500] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
> [ 272.859359] pending: vmpressure_work_fn
> [ 272.860840] workqueue events_freezable_power_: flags=0x84
> [ 272.862461] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256
> [ 272.864479] in-flight: 11286:disk_events_workfn
What's this guy doing? Can you get stack dump on 11286 (or whatever
is in flight in the next lockup)?
> [ 272.866065] pending: disk_events_workfn
> [ 272.867587] workqueue vmstat: flags=0x8
> [ 272.868942] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
> [ 272.870785] pending: vmstat_update
> [ 272.872248] pool 2: cpus=1 node=0 flags=0x0 nice=0 workers=4 idle: 14 218 43
Thanks.
--
tejun
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch 3/3] vmstat: Create our own workqueue
2015-10-29 2:24 ` Tejun Heo
@ 2015-10-29 3:08 ` Tejun Heo
2015-10-30 1:01 ` Christoph Lameter
0 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2015-10-29 3:08 UTC (permalink / raw)
To: Tetsuo Handa
Cc: cl, akpm, mhocko, linux-mm, linux-kernel, torvalds, hannes,
mgorman
On Thu, Oct 29, 2015 at 11:24:47AM +0900, Tejun Heo wrote:
> Hello,
>
> That's weird.
>
> On Wed, Oct 28, 2015 at 08:57:28PM +0900, Tetsuo Handa wrote:
> > [ 272.851035] Showing busy workqueues and worker pools:
> > [ 272.852583] workqueue events: flags=0x0
> > [ 272.853942] pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256
> > [ 272.855781] pending: vmw_fb_dirty_flush [vmwgfx]
> > [ 272.857500] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
> > [ 272.859359] pending: vmpressure_work_fn
> > [ 272.860840] workqueue events_freezable_power_: flags=0x84
> > [ 272.862461] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256
> > [ 272.864479] in-flight: 11286:disk_events_workfn
>
> What's this guy doing? Can you get stack dump on 11286 (or whatever
> is in flight in the next lockup)?
Wait, this series doesn't include Tetsuo's change. Of course it won't
fix the deadlock problem. What's necessary is Tetsuo's patch +
WQ_MEM_RECLAIM.
Thanks.
--
tejun
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch 3/3] vmstat: Create our own workqueue
2015-10-29 3:08 ` Tejun Heo
@ 2015-10-30 1:01 ` Christoph Lameter
2015-10-31 1:15 ` Tejun Heo
2015-10-31 2:43 ` Tetsuo Handa
0 siblings, 2 replies; 19+ messages in thread
From: Christoph Lameter @ 2015-10-30 1:01 UTC (permalink / raw)
To: Tejun Heo
Cc: Tetsuo Handa, akpm, mhocko, linux-mm, linux-kernel, torvalds,
hannes, mgorman
On Thu, 29 Oct 2015, Tejun Heo wrote:
> Wait, this series doesn't include Tetsuo's change. Of course it won't
> fix the deadlock problem. What's necessary is Tetsuo's patch +
> WQ_MEM_RECLAIM.
This series is only dealing with vmstat changes. Do I get an ack here?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch 3/3] vmstat: Create our own workqueue
2015-10-30 1:01 ` Christoph Lameter
@ 2015-10-31 1:15 ` Tejun Heo
2015-10-31 2:43 ` Tetsuo Handa
1 sibling, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2015-10-31 1:15 UTC (permalink / raw)
To: Christoph Lameter
Cc: Tetsuo Handa, akpm, mhocko, linux-mm, linux-kernel, torvalds,
hannes, mgorman
On Thu, Oct 29, 2015 at 08:01:12PM -0500, Christoph Lameter wrote:
> On Thu, 29 Oct 2015, Tejun Heo wrote:
>
> > Wait, this series doesn't include Tetsuo's change. Of course it won't
> > fix the deadlock problem. What's necessary is Tetsuo's patch +
> > WQ_MEM_RECLAIM.
>
> This series is only dealing with vmstat changes. Do I get an ack here?
Yeap, please feel free to add my acked-by.
Thanks.
--
tejun
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch 3/3] vmstat: Create our own workqueue
2015-10-30 1:01 ` Christoph Lameter
2015-10-31 1:15 ` Tejun Heo
@ 2015-10-31 2:43 ` Tetsuo Handa
2015-11-02 16:12 ` Christoph Lameter
1 sibling, 1 reply; 19+ messages in thread
From: Tetsuo Handa @ 2015-10-31 2:43 UTC (permalink / raw)
To: cl; +Cc: htejun, akpm, mhocko, linux-mm, linux-kernel, torvalds, hannes,
mgorman
Christoph Lameter wrote:
> On Thu, 29 Oct 2015, Tejun Heo wrote:
>
> > Wait, this series doesn't include Tetsuo's change. Of course it won't
> > fix the deadlock problem. What's necessary is Tetsuo's patch +
> > WQ_MEM_RECLAIM.
>
> This series is only dealing with vmstat changes. Do I get an ack here?
>
Then, you need to update below description (or drop it) because
patch 3/3 alone will not guarantee that the counters are up to date.
Christoph Lameter wrote:
> Seems that vmstat needs its own workqueue now since the general
> workqueue mechanism has been *enhanced* which means that the
> vmstat_updates cannot run reliably but are being blocked by
> work requests doing memory allocation. Which causes vmstat
> to be unable to keep the counters up to date.
I am waiting for decision from candidates listed at
http://lkml.kernel.org/r/201510251952.CEF04109.OSOtLFHFVFJMQO@I-love.SAKURA.ne.jp .
If your series is not for backporting, please choose one from
the candidates. Can you accept the original patch at
http://lkml.kernel.org/r/201510212126.JIF90648.HOOFJVFQLMStOF@I-love.SAKURA.ne.jp
which implements (1) from the candidates?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch 3/3] vmstat: Create our own workqueue
2015-10-31 2:43 ` Tetsuo Handa
@ 2015-11-02 16:12 ` Christoph Lameter
2015-11-02 16:52 ` Tetsuo Handa
2015-11-06 11:28 ` Tetsuo Handa
0 siblings, 2 replies; 19+ messages in thread
From: Christoph Lameter @ 2015-11-02 16:12 UTC (permalink / raw)
To: Tetsuo Handa
Cc: htejun, akpm, mhocko, linux-mm, linux-kernel, torvalds, hannes,
mgorman
On Sat, 31 Oct 2015, Tetsuo Handa wrote:
> Then, you need to update below description (or drop it) because
> patch 3/3 alone will not guarantee that the counters are up to date.
The vmstat system does not guarantee that the counters are up to date
always. The whole point is the deferral of updates for performance
reasons. They are updated *at some point* within stat_interval. That needs
to happen and that is what this patchset is fixing.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch 3/3] vmstat: Create our own workqueue
2015-11-02 16:12 ` Christoph Lameter
@ 2015-11-02 16:52 ` Tetsuo Handa
2015-11-02 18:10 ` Christoph Lameter
2015-11-06 11:28 ` Tetsuo Handa
1 sibling, 1 reply; 19+ messages in thread
From: Tetsuo Handa @ 2015-11-02 16:52 UTC (permalink / raw)
To: cl; +Cc: htejun, akpm, mhocko, linux-mm, linux-kernel, torvalds, hannes,
mgorman
Christoph Lameter wrote:
> On Sat, 31 Oct 2015, Tetsuo Handa wrote:
>
> > Then, you need to update below description (or drop it) because
> > patch 3/3 alone will not guarantee that the counters are up to date.
>
> The vmstat system does not guarantee that the counters are up to date
> always. The whole point is the deferral of updates for performance
> reasons. They are updated *at some point* within stat_interval. That needs
> to happen and that is what this patchset is fixing.
>
I'm still unclear. I think that the result of this patchset is
The counters are never updated even after stat_interval
if some workqueue item is doing a __GFP_WAIT memory allocation.
but the patch description sounds as if
The counters will be updated even if some workqueue item is
doing a __GFP_WAIT memory allocation.
which denies the actual result I tested with this patchset applied.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch 3/3] vmstat: Create our own workqueue
2015-11-02 16:52 ` Tetsuo Handa
@ 2015-11-02 18:10 ` Christoph Lameter
2015-11-02 19:11 ` Tejun Heo
0 siblings, 1 reply; 19+ messages in thread
From: Christoph Lameter @ 2015-11-02 18:10 UTC (permalink / raw)
To: Tetsuo Handa
Cc: htejun, akpm, mhocko, linux-mm, linux-kernel, torvalds, hannes,
mgorman
On Tue, 3 Nov 2015, Tetsuo Handa wrote:
> I'm still unclear. I think that the result of this patchset is
>
> The counters are never updated even after stat_interval
> if some workqueue item is doing a __GFP_WAIT memory allocation.
>
> but the patch description sounds as if
>
> The counters will be updated even if some workqueue item is
> doing a __GFP_WAIT memory allocation.
>
> which denies the actual result I tested with this patchset applied.
Well true that is dependend on the correct workqueue operation. I though
that was fixed by Tejun?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch 3/3] vmstat: Create our own workqueue
2015-11-02 18:10 ` Christoph Lameter
@ 2015-11-02 19:11 ` Tejun Heo
0 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2015-11-02 19:11 UTC (permalink / raw)
To: Christoph Lameter
Cc: Tetsuo Handa, akpm, mhocko, linux-mm, linux-kernel, torvalds,
hannes, mgorman
On Mon, Nov 02, 2015 at 12:10:04PM -0600, Christoph Lameter wrote:
> Well true that is dependend on the correct workqueue operation. I though
> that was fixed by Tejun?
At least for now, we're going with Tetsuo's short sleep patch.
Thanks.
--
tejun
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch 3/3] vmstat: Create our own workqueue
2015-11-02 16:12 ` Christoph Lameter
2015-11-02 16:52 ` Tetsuo Handa
@ 2015-11-06 11:28 ` Tetsuo Handa
2015-11-06 12:56 ` Christoph Lameter
1 sibling, 1 reply; 19+ messages in thread
From: Tetsuo Handa @ 2015-11-06 11:28 UTC (permalink / raw)
To: cl; +Cc: htejun, akpm, mhocko, linux-mm, linux-kernel, torvalds, hannes,
mgorman
Christoph Lameter wrote:
> On Sat, 31 Oct 2015, Tetsuo Handa wrote:
>
> > Then, you need to update below description (or drop it) because
> > patch 3/3 alone will not guarantee that the counters are up to date.
>
> The vmstat system does not guarantee that the counters are up to date
> always. The whole point is the deferral of updates for performance
> reasons. They are updated *at some point* within stat_interval. That needs
> to happen and that is what this patchset is fixing.
So, if you refer to the blocking of the execution of vmstat updates,
description for patch 3/3 sould be updated to something like below?
----------
Since __GFP_WAIT memory allocations do not call schedule()
when there is nothing to reclaim, and workqueue does not kick
remaining workqueue items unless in-flight workqueue item calls
schedule(), __GFP_WAIT memory allocation requests by workqueue
items can block vmstat_update work item forever.
Since zone_reclaimable() decision depends on vmstat counters
to be up to dated, a silent lockup occurs because a workqueue
item doing a __GFP_WAIT memory allocation request continues
using outdated vmstat counters.
In order to fix this problem, we need to allocate a dedicated
workqueue for vmstat. Note that this patch itself does not fix
lockup problem. Tejun will develop a patch which detects lockup
situation and kick remaining workqueue items.
----------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch 3/3] vmstat: Create our own workqueue
2015-11-06 11:28 ` Tetsuo Handa
@ 2015-11-06 12:56 ` Christoph Lameter
0 siblings, 0 replies; 19+ messages in thread
From: Christoph Lameter @ 2015-11-06 12:56 UTC (permalink / raw)
To: Tetsuo Handa
Cc: htejun, akpm, mhocko, linux-mm, linux-kernel, torvalds, hannes,
mgorman
On Fri, 6 Nov 2015, Tetsuo Handa wrote:
> So, if you refer to the blocking of the execution of vmstat updates,
> description for patch 3/3 sould be updated to something like below?
Ok that is much better.
> ----------
> Since __GFP_WAIT memory allocations do not call schedule()
> when there is nothing to reclaim, and workqueue does not kick
> remaining workqueue items unless in-flight workqueue item calls
> schedule(), __GFP_WAIT memory allocation requests by workqueue
> items can block vmstat_update work item forever.
>
> Since zone_reclaimable() decision depends on vmstat counters
> to be up to dated, a silent lockup occurs because a workqueue
> item doing a __GFP_WAIT memory allocation request continues
> using outdated vmstat counters.
>
> In order to fix this problem, we need to allocate a dedicated
> workqueue for vmstat. Note that this patch itself does not fix
> lockup problem. Tejun will develop a patch which detects lockup
> situation and kick remaining workqueue items.
> ----------
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread