Re: [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Tejun Heo <tj@kernel.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: Li Zefan <lizefan@huawei.com>,
	Anton Vorontsov <anton.vorontsov@linaro.org>,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Subject: Re: [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled
Date: Fri, 12 Jul 2013 11:34:04 -0700	[thread overview]
Message-ID: <20130712183404.GA23680@mtj.dyndns.org> (raw)
In-Reply-To: <20130712084039.GA13224@dhcp22.suse.cz>

Hello, Michal.

On Fri, Jul 12, 2013 at 10:40:39AM +0200, Michal Hocko wrote:
> > It's not something white and black but for things which can be made
> > trivially synchrnous, it usually is better to do it that way,
> 
> True in general but it is also true (in general) that once we have a
> reference counting for controlling life cycle for an object we should
> not bypass it.

When you have a reference count with staged destruction like cgroup,
the object goes two stagages before being released.  It first gets
deactivated or removed from further access and the base ref is
dropped.  This state presists until the left-over references drain.
When all the references are dropped, the object is actually released.

Now, between the initiation of destruction and actual release, the
object often isn't in the same state as before the destruction
started.  It depends on the specifics of the subsystem but usually
only subset of the original functionalities are accessible.  At times,
it becomes tricky discerning what's safe and what's not and we do end
up with latent subtle bugs which are only triggered when tricky
conditions are met - things like certain operations stretching across
destruction point.

It is important to distinguish the boundary between things which
remain accessible after destruction point and it often is easier and
less dangerous to avoid expanding that set unless necessary, so it is
not "bypassing" an existing mechanism at all.  It is an inherent part
of that model and various kernel subsystems have been doing that
forever.

> So I guess we are safe with the code as is but this all is really
> _tricky_ and deserves a fat comment. So rather than adding flushing work
> item code we should document it properly.
> 
> Or am I missing something?

Two things.

* I have no idea what would prevent the work item execution kicking in
  way after the inode is released.  The execution is not flushed and
  the work item isn't pinning the underlying data structure.  There's
  nothing preventing the work item execution to happen after or
  stretch across inode release.

* You're turning something which has a clearly established pattern of
  usage - work item shutdown which is usually done by disabling
  issuance and then flushing / canceling - into something really
  tricky and deserving a fat comment.  In general, kernel engineering
  is never about finding the most optimal / minimal / ingenious
  solution for each encountered instances of problems.  In most cases,
  it's usually about identifying established patterns and applying
  them so that the code is not only safe and correct but also remains
  accessible to the fellow kernel developers.

  Here, people who are familiary with workqueue usage would
  automatically look for shutdown sequence as the work item is
  contained in a dynamic object.  There are cases where
  synchronization happens externally and you don't necessarily need
  flush / cancel, but even then, if you consider readability and
  maintainability, following the established pattern is often then
  right thing to do.  It makes the code easier to follow and verify
  for others and avoids unintentional breakages afterwards as you
  separate out work item draining from the external synchronization
  which may change later on without noticing how it's interlocked with
  work item draining.

> OK, I haven't realized the action waits for finishing. /me is not
> regular work_queue user...

So, please, follow the established patterns.  This really isn't the
place for ingenuity.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2013-07-12 18:34 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20130710184254.GA16979@mtj.dyndns.org>
     [not found] ` <20130711083110.GC21667@dhcp22.suse.cz>
     [not found]   ` <51DE701C.6010800@huawei.com>
     [not found]     ` <20130711092542.GD21667@dhcp22.suse.cz>
     [not found]       ` <51DE7AAF.6070004@huawei.com>
2013-07-11  9:33         ` [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled Michal Hocko
2013-07-11 15:44           ` Tejun Heo
2013-07-11 16:22             ` Michal Hocko
2013-07-11 16:32               ` Tejun Heo
2013-07-12  8:40                 ` Michal Hocko
2013-07-12  9:20                   ` Li Zefan
2013-07-12  9:29                     ` Michal Hocko
2013-07-12  9:54                       ` Li Zefan
2013-07-12 10:37                         ` Michal Hocko
2013-07-15  3:07                           ` Li Zefan
2013-07-15  9:20                             ` Michal Hocko
2013-07-15  9:53                               ` Li Zefan
2013-07-12  9:24                   ` [PATCH 1/3] vmpressure: document why css_get/put is not necessary for work queue based signaling Michal Hocko
2013-07-12  9:24                     ` [PATCH 2/3] vmpressure: change vmpressure::sr_lock to spinlock Michal Hocko
2013-07-12  9:24                     ` [PATCH 3/3] vmpressure: do not check for pending work to prevent from new work Michal Hocko
2013-07-12 18:48                     ` [PATCH 1/3] vmpressure: document why css_get/put is not necessary for work queue based signaling Tejun Heo
2013-07-15 10:27                       ` Michal Hocko
2013-07-12 18:34                   ` Tejun Heo [this message]
2013-07-12 18:40                     ` [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled Tejun Heo
2013-07-12  6:03               ` Li Zefan
2013-07-15 10:30             ` [PATCH v3 1/3] vmpressure: change vmpressure::sr_lock to spinlock Michal Hocko
2013-07-15 10:30               ` [PATCH v3 2/3] vmpressure: do not check for pending work to prevent from new work Michal Hocko
2013-07-15 10:30               ` [PATCH v3 3/3] vmpressure: Make sure there are no events queued after memcg is offlined Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130712183404.GA23680@mtj.dyndns.org \
    --to=tj@kernel.org \
    --cc=anton.vorontsov@linaro.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=lizefan@huawei.com \
    --cc=mhocko@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).