From: Marcelo Tosatti <mtosatti@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Minchan Kim <minchan@kernel.org>,
Matthew Wilcox <willy@infradead.org>,
Mel Gorman <mgorman@techsingularity.net>,
Nicolas Saenz Julienne <nsaenzju@redhat.com>,
Juri Lelli <juri.lelli@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
"Paul E. McKenney" <paulmck@kernel.org>
Subject: Re: [patch v3] mm: lru_cache_disable: replace work queue synchronization with synchronize_rcu
Date: Fri, 4 Mar 2022 12:11:36 -0300 [thread overview]
Message-ID: <YiIsKHiX9h0j1G2o@fuller.cnet> (raw)
In-Reply-To: <20220303170323.82d8424d214fcb3a32155952@linux-foundation.org>
On Thu, Mar 03, 2022 at 05:03:23PM -0800, Andrew Morton wrote:
> (Question for paulmck below, please)
>
> On Tue, 22 Feb 2022 13:07:35 -0300 Marcelo Tosatti <mtosatti@redhat.com> wrote:
>
> >
> > On systems that run FIFO:1 applications that busy loop
> > on isolated CPUs, executing tasks on such CPUs under
> > lower priority is undesired (since that will either
> > hang the system, or cause longer interruption to the
> > FIFO task due to execution of lower priority task
> > with very small sched slices).
> >
> > Commit d479960e44f27e0e52ba31b21740b703c538027c ("mm: disable LRU
> > pagevec during the migration temporarily") relies on
> > queueing work items on all online CPUs to ensure visibility
> > of lru_disable_count.
> >
> > However, its possible to use synchronize_rcu which will provide the same
> > guarantees (see comment this patch modifies on lru_cache_disable).
> >
> > Fixes:
> >
> > [ 1873.243925] INFO: task kworker/u160:0:9 blocked for more than 622 seconds.
> > [ 1873.243927] Tainted: G I --------- --- 5.14.0-31.rt21.31.el9.x86_64 #1
> > [ 1873.243929] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [ 1873.243929] task:kworker/u160:0 state:D stack: 0 pid: 9 ppid: 2 flags:0x00004000
> > [ 1873.243932] Workqueue: cpuset_migrate_mm cpuset_migrate_mm_workfn
> > [ 1873.243936] Call Trace:
> > [ 1873.243938] __schedule+0x21b/0x5b0
> > [ 1873.243941] schedule+0x43/0xe0
> > [ 1873.243943] schedule_timeout+0x14d/0x190
> > [ 1873.243946] ? resched_curr+0x20/0xe0
> > [ 1873.243953] ? __prepare_to_swait+0x4b/0x70
> > [ 1873.243958] wait_for_completion+0x84/0xe0
> > [ 1873.243962] __flush_work.isra.0+0x146/0x200
> > [ 1873.243966] ? flush_workqueue_prep_pwqs+0x130/0x130
> > [ 1873.243971] __lru_add_drain_all+0x158/0x1f0
> > [ 1873.243978] do_migrate_pages+0x3d/0x2d0
> > [ 1873.243985] ? pick_next_task_fair+0x39/0x3b0
> > [ 1873.243989] ? put_prev_task_fair+0x1e/0x30
> > [ 1873.243992] ? pick_next_task+0xb30/0xbd0
> > [ 1873.243995] ? __tick_nohz_task_switch+0x1e/0x70
> > [ 1873.244000] ? raw_spin_rq_unlock+0x18/0x60
> > [ 1873.244002] ? finish_task_switch.isra.0+0xc1/0x2d0
> > [ 1873.244005] ? __switch_to+0x12f/0x510
> > [ 1873.244013] cpuset_migrate_mm_workfn+0x22/0x40
> > [ 1873.244016] process_one_work+0x1e0/0x410
> > [ 1873.244019] worker_thread+0x50/0x3b0
> > [ 1873.244022] ? process_one_work+0x410/0x410
> > [ 1873.244024] kthread+0x173/0x190
> > [ 1873.244027] ? set_kthread_struct+0x40/0x40
> > [ 1873.244031] ret_from_fork+0x1f/0x30
> >
> > ...
> >
> > --- a/mm/swap.c
> > +++ b/mm/swap.c
> > @@ -831,8 +831,7 @@ inline void __lru_add_drain_all(bool force_all_cpus)
> > for_each_online_cpu(cpu) {
> > struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
> >
> > - if (force_all_cpus ||
> > - pagevec_count(&per_cpu(lru_pvecs.lru_add, cpu)) ||
> > + if (pagevec_count(&per_cpu(lru_pvecs.lru_add, cpu)) ||
> > data_race(pagevec_count(&per_cpu(lru_rotate.pvec, cpu))) ||
> > pagevec_count(&per_cpu(lru_pvecs.lru_deactivate_file, cpu)) ||
> > pagevec_count(&per_cpu(lru_pvecs.lru_deactivate, cpu)) ||
>
> This change appears to be "don't queue work on CPUs which don't have
> any work to do". Correct? This isn't changelogged?
>
> > @@ -876,14 +875,19 @@ atomic_t lru_disable_count = ATOMIC_INIT(0);
> > void lru_cache_disable(void)
> > {
> > atomic_inc(&lru_disable_count);
> > + synchronize_rcu();
> > #ifdef CONFIG_SMP
> > /*
> > - * lru_add_drain_all in the force mode will schedule draining on
> > - * all online CPUs so any calls of lru_cache_disabled wrapped by
> > - * local_lock or preemption disabled would be ordered by that.
> > - * The atomic operation doesn't need to have stronger ordering
> > - * requirements because that is enforced by the scheduling
> > - * guarantees.
> > + * synchronize_rcu() waits for preemption disabled
> > + * and RCU read side critical sections.
> > + * For the users of lru_disable_count:
> > + *
> > + * preempt_disable, local_irq_disable [bh_lru_lock()]
> > + * rcu_read_lock [rt_spin_lock CONFIG_PREEMPT_RT]
> > + * preempt_disable [local_lock !CONFIG_PREEMPT_RT]
> > + *
> > + * so any calls of lru_cache_disabled wrapped by local_lock or
> > + * preemption disabled would be ordered by that.
> > */
> > __lru_add_drain_all(true);
> > #else
>
> Does this also work with CONFIG_TINY_RCU?
>
> This seems abusive of synchronize_rcu(). None of this code uses RCU,
> but it so happens that synchronize_rcu() happily provides the desired
> effects. Changes in RCU's happy side-effects might break this.
> Perhaps a formal API function which does whatever-you-want-it-to-do
> would be better.
>
> And... I really don't understand the fix. What is it about
> synchronize_rcu() which guarantees that a work function which is queued
> on CPU N will now get executed even if CPU N is spinning in SCHED_FIFO
> userspace?
It does not. synchronize_rcu() replaces queueing the work functions,
to ensure visibility of lru_disable_count.
next prev parent reply other threads:[~2022-03-04 15:12 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-22 16:01 [patch v2] mm: lru_cache_disable: replace work queue synchronization with synchronize_rcu Marcelo Tosatti
2022-02-22 16:07 ` [patch v3] " Marcelo Tosatti
2022-02-22 16:25 ` Nicolas Saenz Julienne
2022-03-04 1:03 ` Andrew Morton
2022-03-04 1:49 ` Paul E. McKenney
2022-03-04 15:08 ` Marcelo Tosatti
2022-03-04 16:02 ` Paul E. McKenney
2022-03-04 15:11 ` Marcelo Tosatti [this message]
2022-03-04 16:29 ` [patch v4] " Marcelo Tosatti
2022-03-05 0:35 ` Andrew Morton
2022-03-07 18:52 ` Marcelo Tosatti
2022-03-10 13:22 ` [patch v5] " Marcelo Tosatti
2022-03-11 2:23 ` Andrew Morton
2022-03-11 8:35 ` Sebastian Andrzej Siewior
2022-03-12 0:40 ` Andrew Morton
2022-03-12 20:39 ` Marcelo Tosatti
2022-03-13 9:23 ` Hillf Danton
2022-03-31 13:52 ` Borislav Petkov
2022-04-28 18:00 ` Marcelo Tosatti
2022-05-28 21:18 ` Andrew Morton
2022-05-28 22:54 ` Michael Larabel
2022-05-29 0:48 ` Michael Larabel
2022-06-19 12:14 ` Thorsten Leemhuis
2022-06-22 0:15 ` Andrew Morton
2022-03-05 4:33 ` [patch v4] " Paul E. McKenney
2022-03-08 17:41 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YiIsKHiX9h0j1G2o@fuller.cnet \
--to=mtosatti@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=bigeasy@linutronix.de \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=minchan@kernel.org \
--cc=nsaenzju@redhat.com \
--cc=paulmck@kernel.org \
--cc=tglx@linutronix.de \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.