Re: [PATCH RFC] fsio: filesystem io accounting cgroup

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Konstantin Khlebnikov <khlebnikov@openvz.org>
To: Sha Zhengju <handai.szj@gmail.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Vivek Goyal <vgoyal@redhat.com>, Tejun Heo <tj@kernel.org>,
	Jens Axboe <axboe@kernel.dk>,
	"devel@openvz.org" <devel@openvz.org>,
	"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
	Michal Hocko <mhocko@suse.cz>
Subject: Re: [PATCH RFC] fsio: filesystem io accounting cgroup
Date: Wed, 10 Jul 2013 10:03:27 +0400	[thread overview]
Message-ID: <51DCF92F.6090409@openvz.org> (raw)
In-Reply-To: <CAFj3OHVtVGDnWHqNBRZH+LNtzDrbk8PO0fKLwFscZAWCJRW9oA@mail.gmail.com>

Sha Zhengju wrote:
> Hi,
>
> On Mon, Jul 8, 2013 at 5:59 PM, Konstantin Khlebnikov
> <khlebnikov@openvz.org>  wrote:
>> >  This is proof of concept, just basic functionality for IO controller.
>> >  This cgroup will control filesystem usage on vfs layer, it's main goal is
>> >  bandwidth control. It's supposed to be much more lightweight than memcg/blkio.
>> >
>> >  This patch shows easy way for accounting pages in dirty/writeback state in
>> >  per-inode manner. This is easier that doing this in memcg in per-page manner.
>> >  Main idea is in keeping on each inode pointer (->i_fsio) to cgroup which owns
>> >  dirty data in that inode. It's settled by fsio_account_page_dirtied() when
>> >  first dirty tag appears in the inode. Relying to mapping tags gives us locking
>> >  for free, this patch doesn't add any new locks to hot paths.
> While referring to dirty/writeback numbers, what I care about is 'how
> many dirties in how many memory' and later may use the proportion to
> decide throttling or something else. So if you are talking about nr of
> dirty pages without memcg's amount of memory, I don't see the meaning
> of a single number.

I'm planning to add some thresholds or limits to fsio cgroup -- how many dirty pages
this cgroup may have. memcg is completely different thing: memcg controls data storage
while fsio controls data flows. Memcg already handles too much, I just don't want add
yet another unrelated stuff into it. Otherwise we will end with one single controller
which would handle all possible resources, because they all related in some cases.

>
> What's more, counting dirty/writeback stats in per-node manner can
> bring inaccuracy in some situations: considering two tasks from
> different fsio cgroups are dirtying one file concurrently but may only
> be counting in one fsio stats, or a task is moved to another fsio
> cgroup after dirtrying one file. As talking about task moving, it is
> the root cause of adding memcg locks in page stat routines, since
> there's a race window between 'modify cgroup owner' and 'update stats
> using cgroup pointer'. But if you are going to handle task move or
> take care of ->i_fsio for better accuracy in future, I'm afraid you
> will also need some synchronization mechanism in hot paths. Maybe also
> a new lock or mapping->tree_lock(which is already hot enough) IMHO.

Yes, per-inode accounting is less accurate. But this approach works really
well in the real life. I don't want add new locks and loose performance just
to fix accuracy for some artificial cases.

to Tejun:
BTW I don't like that volatility of task's cgroup ponters. I'd like to forbid
moving tasks between cgroups except for 'current', existing behavior can be
kept with help of task_work: instead external change of task->cgroups we can
schedule task_work into it in and change that pointer in the 'current' context.
That will save us a lot of rcu_lock/unlock and atomic operations in grabbing
temporary pointers to current cgroup because current->cgroups will be stable.
I don't think that external cross-cgroup task migration is really performance
critical. Currently I don't know what to do with kernel threads and workqueues,
but any way this problem doesn't look unsolvable.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2013-07-10  6:03 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-08  9:59 [PATCH RFC] fsio: filesystem io accounting cgroup Konstantin Khlebnikov
2013-07-10  4:43 ` Sha Zhengju
2013-07-10  6:03   ` Konstantin Khlebnikov [this message]
2013-07-10  8:37     ` Sha Zhengju
  -- strict thread matches above, loose matches on Subject: below --
2013-07-08 10:01 Konstantin Khlebnikov
2013-07-08 17:00 ` Tejun Heo
2013-07-08 17:52   ` Vivek Goyal
2013-07-08 17:56     ` Tejun Heo
2013-07-09  8:28       ` Konstantin Khlebnikov
2013-07-09 12:57         ` Tejun Heo
2013-07-09 13:15           ` Konstantin Khlebnikov
2013-07-09 13:16             ` Tejun Heo
2013-07-09 13:16               ` Tejun Heo
2013-07-09 13:43                 ` Konstantin Khlebnikov
2013-07-09 13:45                   ` Tejun Heo
2013-07-09 14:18                     ` Vivek Goyal
2013-07-09 14:29                       ` Tejun Heo
2013-07-09 14:54                         ` Vivek Goyal
2013-07-09 15:08                           ` Tejun Heo
     [not found]                             ` <20130710030955.GA3569@redhat.com>
2013-07-10  3:50                               ` Tejun Heo
2013-07-09 14:35                     ` Konstantin Khlebnikov
2013-07-09 14:42                       ` Tejun Heo
2013-07-09 15:06                       ` Vivek Goyal
2013-07-09 17:42                         ` Konstantin Khlebnikov
2013-07-09 18:35                           ` Vivek Goyal
2013-07-09 20:54                             ` Konstantin Khlebnikov
2013-07-08 18:11 ` Vivek Goyal
2013-07-09 15:39 ` Theodore Ts'o
2013-07-09 17:12   ` Konstantin Khlebnikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51DCF92F.6090409@openvz.org \
    --to=khlebnikov@openvz.org \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=cgroups@vger.kernel.org \
    --cc=devel@openvz.org \
    --cc=handai.szj@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=tj@kernel.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).