public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: axboe@kernel.dk, ctalbott@google.com, rni@google.com,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 08/17] blkcg: shoot down blkio_groups on elevator switch
Date: Mon, 23 Jan 2012 09:10:49 -0800	[thread overview]
Message-ID: <20120123171049.GH12652@google.com> (raw)
In-Reply-To: <20120123162553.GI25986@redhat.com>

On Mon, Jan 23, 2012 at 11:25:53AM -0500, Vivek Goyal wrote:
> It does not have to be a regular switch. Even one switch during boot
> can create issues. 
> 
> In RHEL we have the set of scripts which can do system tuning like based
> on user chosen profile (tuned). These scripts do various things including
> changing elevator. Once you have chosen the profile, it gets applied
> automatically over every boot (through init scripts).
>
> Now assume that after a reboot libvirtd is running and resuming various
> suspended virtual machines or starting new one and in parallel this
> profile is being applied. There is no way to avoid races as systemd allows
> parallel execution of services. The only way left will be strong
> serialization and that is no cgroup operation is taking place in the
> system while some init script is chaning the elevator (no new cgroup
> creatoin, no cgroup deletions and no rule settings by any daemon),
> otherwise changes might be lost. In practice how would I program 
> various init scripts for this?

Why can't systemd order elevator switch before other actions?  It's
not really about switching elevators but about having set of applied
policies set before configuring them.

It is natural to require the target of configuration to be set up
before configuring it, right?  You can't set attributes on eth0 or sda
when those don't exist.  This isn't very different.  You need to have
set of policies and their parameters defined before going ahead with
their configurations and there naturally is ordering between the two
steps - e.g. it doesn't make any sense and is actually misleading to
allow configuration of propio when the elevator in choice doesn't
provide it.

Of course, details of such ordering requirement including granularity
have to be decided and we can decide that keeping things at per-policy
granularity is important enough to justify extra complexity, which I
don't think is the case here.

There are two separate points here.

1. Regardless of persistency granularity, which policies are enabled
   for a device must be determined before configuring the policies.
   The policy_node stuff worked around this by keeping per-policy
   configurations in the core separately violating proper layering and
   any usual conventions.  It's like keeping ata_N_conf or eth_N_conf
   in kernel for devices which may appear in the future.  It's silly
   at best.

2. The granularity of configuration reset is a separate issue and it
   might make sense to do it fine-grained if that is important enough,
   but given how elv/pol changes are used, I am very skeptical this is
   necessary.

No matter what we do for #2, #1 requires ordering between policy
selection and configuration.  You're saying that #2, combined with the
fact that blk-throtl can't be built as module or disabled on runtime,
allows side-stepping the issue for at least blk-throtl.  That doesn't
sound like a good idea to me.  People are working on different
elevators implementing different cgroup strategies.  There is no sane
way around requiring "choosing of policies" to happen before
"configuration of chosen policies".

-- 
tejun

  reply	other threads:[~2012-01-23 17:10 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-22  3:25 [PATCHSET] blkcg: kill policy node and blkg->dev, take#2 Tejun Heo
2012-01-22  3:25 ` [PATCH 01/17] blkcg: make CONFIG_BLK_CGROUP bool Tejun Heo
2012-01-23 15:00   ` Vivek Goyal
2012-01-23 15:34     ` Tejun Heo
2012-01-22  3:25 ` [PATCH 02/17] cfq: don't register propio policy if !CONFIG_CFQ_GROUP_IOSCHED Tejun Heo
2012-01-22  3:25 ` [PATCH 03/17] elevator: clear auxiliary data earlier during elevator switch Tejun Heo
2012-01-22  3:25 ` [PATCH 04/17] elevator: make elevator_init_fn() return 0/-errno Tejun Heo
2012-01-22  3:25 ` [PATCH 05/17] block: implement blk_queue_bypass_start/end() Tejun Heo
2012-01-22  3:25 ` [PATCH 06/17] block: extend queue bypassing to cover blkcg policies Tejun Heo
2012-01-22  3:25 ` [PATCH 07/17] blkcg: make blkio_list_lock irq-safe Tejun Heo
2012-01-22  3:25 ` [PATCH 08/17] blkcg: shoot down blkio_groups on elevator switch Tejun Heo
2012-01-23 15:20   ` Vivek Goyal
2012-01-23 15:36     ` Vivek Goyal
2012-01-23 15:49       ` Tejun Heo
2012-01-23 15:39     ` Tejun Heo
2012-01-23 15:52       ` Vivek Goyal
2012-01-23 15:57         ` Tejun Heo
2012-01-23 16:10           ` Vivek Goyal
2012-01-23 16:13             ` Vivek Goyal
2012-01-23 16:20               ` Tejun Heo
2012-01-23 16:28                 ` Vivek Goyal
2012-01-23 16:32                   ` Tejun Heo
2012-01-23 16:16             ` Tejun Heo
2012-01-23 16:25               ` Vivek Goyal
2012-01-23 17:10                 ` Tejun Heo [this message]
2012-01-23 18:27                   ` Vivek Goyal
2012-01-23 18:43                     ` Tejun Heo
2012-01-23 19:33                       ` Tejun Heo
2012-01-23 19:57                         ` Vivek Goyal
2012-01-23 20:33                           ` Tejun Heo
2012-01-23 20:43                       ` Lennart Poettering
2012-01-23 20:47                         ` Tejun Heo
2012-01-23 21:03                           ` Vivek Goyal
2012-01-23 20:40                     ` Lennart Poettering
2012-01-23 18:32                   ` Vivek Goyal
2012-01-23 18:51                     ` Tejun Heo
2012-01-22  3:25 ` [PATCH 09/17] blkcg: move rcu_read_lock() outside of blkio_group get functions Tejun Heo
2012-01-22  3:25 ` [PATCH 10/17] blkcg: update blkg get functions take blkio_cgroup as parameter Tejun Heo
2012-01-22  3:25 ` [PATCH 11/17] blkcg: use q and plid instead of opaque void * for blkio_group association Tejun Heo
2012-01-22  3:25 ` [PATCH 12/17] blkcg: add blkio_policy[] array and allow one policy per policy ID Tejun Heo
2012-01-22  3:25 ` [PATCH 13/17] blkcg: use the usual get blkg path for root blkio_group Tejun Heo
2012-01-22  3:25 ` [PATCH 14/17] blkcg: factor out blkio_group creation Tejun Heo
2012-01-22  3:25 ` [PATCH 15/17] blkcg: don't allow or retain configuration of missing devices Tejun Heo
2012-01-22  3:25 ` [PATCH 16/17] blkcg: kill blkio_policy_node Tejun Heo
2012-01-22  3:25 ` [PATCH 17/17] blkcg: kill the mind-bending blkg->dev Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120123171049.GH12652@google.com \
    --to=tj@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=ctalbott@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rni@google.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox