All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Suren Baghdasaryan <surenb@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Tejun Heo <tj@kernel.org>,
	lizefan@huawei.com, axboe@kernel.dk, dennis@kernel.org,
	Dennis Zhou <dennisszhou@gmail.com>,
	Ingo Molnar <mingo@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jonathan Corbet <corbet@lwn.net>,
	cgroups@vger.kernel.org, linux-mm <linux-mm@kvack.org>,
	linux-doc@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
	kernel-team@android.com
Subject: Re: [PATCH v2 5/5] psi: introduce psi monitor
Date: Mon, 14 Jan 2019 14:42:22 -0500	[thread overview]
Message-ID: <20190114194222.GA10571@cmpxchg.org> (raw)
In-Reply-To: <CAJuCfpGUWs0E9oPUjPTNm=WhPJcE_DBjZCtCiaVu5WXabKRW6A@mail.gmail.com>

On Mon, Jan 14, 2019 at 11:30:12AM -0800, Suren Baghdasaryan wrote:
> On Mon, Jan 14, 2019 at 2:22 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Thu, Jan 10, 2019 at 02:07:18PM -0800, Suren Baghdasaryan wrote:
> > > +/*
> > > + * psi_update_work represents slowpath accounting part while
> > > + * psi_group_change represents hotpath part.
> > > + * There are two potential races between these path:
> > > + * 1. Changes to group->polling when slowpath checks for new stall, then
> > > + *    hotpath records new stall and then slowpath resets group->polling
> > > + *    flag. This leads to the exit from the polling mode while monitored
> > > + *    states are still changing.
> > > + * 2. Slowpath overwriting an immediate update scheduled from the hotpath
> > > + *    with a regular update further in the future and missing the
> > > + *    immediate update.
> > > + * Both races are handled with a retry cycle in the slowpath:
> > > + *
> > > + *    HOTPATH:                         |    SLOWPATH:
> > > + *                                     |
> > > + * A) times[cpu] += delta              | E) delta = times[*]
> > > + * B) start_poll = (delta[poll_mask] &&|    if delta[poll_mask]:
> > > + *      cmpxchg(g->polling, 0, 1) == 0)| F)   polling_until = now +
> > > + *                                     |              grace_period
> > > + *                                     |    if now > polling_until:
> > > + *    if start_poll:                   |      if g->polling:
> > > + * C)   mod_delayed_work(1)            | G)     g->polling = polling = 0
> > > + *    else if !delayed_work_pending(): | H)     goto SLOWPATH
> > > + * D)   schedule_delayed_work(PSI_FREQ)|    else:
> > > + *                                     |      if !g->polling:
> > > + *                                     | I)     g->polling = polling = 1
> > > + *                                     | J) if delta && first_pass:
> > > + *                                     |      next_avg = calculate_averages()
> > > + *                                     |      if polling:
> > > + *                                     |        next_poll = poll_triggers()
> > > + *                                     |    if (delta && first_pass) || polling:
> > > + *                                     | K)   mod_delayed_work(
> > > + *                                     |          min(next_avg, next_poll))
> > > + *                                     |      if !polling:
> > > + *                                     |        first_pass = false
> > > + *                                     | L)     goto SLOWPATH
> > > + *
> > > + * Race #1 is represented by (EABGD) sequence in which case slowpath
> > > + * deactivates polling mode because it misses new monitored stall and hotpath
> > > + * doesn't activate it because at (B) g->polling is not yet reset by slowpath
> > > + * in (G). This race is handled by the (H) retry, which in the race described
> > > + * above results in the new sequence of (EABGDHEIK) that reactivates polling
> > > + * mode.
> > > + *
> > > + * Race #2 is represented by polling==false && (JABCK) sequence which
> > > + * overwrites immediate update scheduled at (C) with a later (next_avg) update
> > > + * scheduled at (K). This race is handled by the (L) retry which results in the
> > > + * new sequence of polling==false && (JABCKLEIK) that reactivates polling mode
> > > + * and reschedules next polling update (next_poll).
> > > + *
> > > + * Note that retries can't result in an infinite loop because retry #1 happens
> > > + * only during polling reactivation and retry #2 happens only on the first
> > > + * pass. Constant reactivations are impossible because polling will stay active
> > > + * for at least grace_period. Worst case scenario involves two retries (HEJKLE)
> > > + */
> >
> > I'm having a fairly hard time with this. There's a distinct lack of
> > memory ordering, and a suspicious mixing of atomic ops (cmpxchg) and
> > regular loads and stores (without READ_ONCE/WRITE_ONCE even).
> >
> > Please clarify.
> 
> Thanks for the feedback.
> I do mix atomic and regular loads with g->polling only because the
> slowpath is the only one that resets it back to 0, so
> cmpxchg(g->polling, 1, 0) == 1 at (G) would always return 1.
> Setting g->polling back to 1 at (I) indeed needs an atomic operation
> but at that point it does not matter whether hotpath or slowpath sets
> it. In either case we will schedule a polling update.
> Am I missing anything?
> 
> For memory ordering (which Johannes also pointed out) the critical point is:
> 
> times[cpu] += delta           | if g->polling:
> smp_wmb()                     |   g->polling = polling = 0
> cmpxchg(g->polling, 0, 1)     |   smp_rmb()
>                               |   delta = times[*] (through goto SLOWPATH)
> 
> So that hotpath writes to times[] then g->polling and slowpath reads
> g->polling then times[]. cmpxchg() implies a full barrier, so we can
> drop smp_wmb(). Something like this:
> 
> times[cpu] += delta           | if g->polling:
> cmpxchg(g->polling, 0, 1)     |   g->polling = polling = 0
>                               |   smp_rmb()
>                               |   delta = times[*] (through goto SLOWPATH)

delta = times[*] is get_recent_times(), which uses a seqcount and so
implies the smp_rmb() already as well. So we shouldn't need another
explicit one. But the comment should point out all the barriers.

  reply	other threads:[~2019-01-14 19:42 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-10 22:07 [PATCH v2 0/5] psi: pressure stall monitors v2 Suren Baghdasaryan
2019-01-10 22:07 ` [PATCH v2 1/5] fs: kernfs: add poll file operation Suren Baghdasaryan
2019-01-11  6:25   ` Greg KH
2019-01-10 22:07 ` [PATCH v2 2/5] kernel: cgroup: " Suren Baghdasaryan
2019-01-10 22:07 ` [PATCH v2 3/5] psi: introduce state_mask to represent stalled psi states Suren Baghdasaryan
2019-01-10 22:07 ` [PATCH v2 4/5] psi: rename psi fields in preparation for psi trigger addition Suren Baghdasaryan
2019-01-10 22:07 ` [PATCH v2 5/5] psi: introduce psi monitor Suren Baghdasaryan
2019-01-14 10:21   ` Peter Zijlstra
2019-01-14 19:30     ` Suren Baghdasaryan
2019-01-14 19:42       ` Johannes Weiner [this message]
2019-01-14 20:47         ` Suren Baghdasaryan
2019-01-16 13:24       ` Peter Zijlstra
2019-01-16 17:39         ` Suren Baghdasaryan
2019-01-16 19:17           ` Johannes Weiner
2019-01-16 19:27             ` Johannes Weiner
2019-01-16 21:29               ` Suren Baghdasaryan
2019-01-17  8:13             ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190114194222.GA10571@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=cgroups@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=dennis@kernel.org \
    --cc=dennisszhou@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=kernel-team@android.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan@huawei.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.