From: Ming Lei <ming.lei-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH v6 3/3] blk-cgroup: Optimize blkcg_rstat_flush()
Date: Mon, 6 Jun 2022 11:16:29 +0800 [thread overview]
Message-ID: <Yp1xjRyU9L5FiWXQ@T590> (raw)
In-Reply-To: <20220602192020.166940-4-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
On Thu, Jun 02, 2022 at 03:20:20PM -0400, Waiman Long wrote:
> For a system with many CPUs and block devices, the time to do
> blkcg_rstat_flush() from cgroup_rstat_flush() can be rather long. It
> can be especially problematic as interrupt is disabled during the flush.
> It was reported that it might take seconds to complete in some extreme
> cases leading to hard lockup messages.
>
> As it is likely that not all the percpu blkg_iostat_set's has been
> updated since the last flush, those stale blkg_iostat_set's don't need
> to be flushed in this case. This patch optimizes blkcg_rstat_flush()
> by keeping a lockless list of recently updated blkg_iostat_set's in a
> newly added percpu blkcg->lhead pointer.
>
> The blkg_iostat_set is added to the lockless list on the update side
> in blk_cgroup_bio_start(). It is removed from the lockless list when
> flushed in blkcg_rstat_flush(). Due to racing, it is possible that
> blk_iostat_set's in the lockless list may have no new IO stats to be
> flushed. To protect against destruction of blkg, a percpu reference is
> gotten when putting into the lockless list and put back when removed.
>
> A blkg_iostat_set can determine if it is in a lockless list by checking
> the content of its lnode.next pointer which will be non-NULL when in
> a lockless list. This requires the presence of a special llist_last
> sentinel node to be put at the end of the lockless list.
>
> When booting up an instrumented test kernel with this patch on a
> 2-socket 96-thread system with cgroup v2, out of the 2051 calls to
> cgroup_rstat_flush() after bootup, 1788 of the calls were exited
> immediately because of empty lockless list. After an all-cpu kernel
> build, the ratio became 6295424/6340513. That was more than 99%.
>
> Signed-off-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Reviewed-by: Ming Lei <ming.lei-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Thanks,
Ming
WARNING: multiple messages have this Message-ID (diff)
From: Ming Lei <ming.lei@redhat.com>
To: Waiman Long <longman@redhat.com>
Cc: Tejun Heo <tj@kernel.org>, Jens Axboe <axboe@kernel.dk>,
cgroups@vger.kernel.org, linux-block@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v6 3/3] blk-cgroup: Optimize blkcg_rstat_flush()
Date: Mon, 6 Jun 2022 11:16:29 +0800 [thread overview]
Message-ID: <Yp1xjRyU9L5FiWXQ@T590> (raw)
In-Reply-To: <20220602192020.166940-4-longman@redhat.com>
On Thu, Jun 02, 2022 at 03:20:20PM -0400, Waiman Long wrote:
> For a system with many CPUs and block devices, the time to do
> blkcg_rstat_flush() from cgroup_rstat_flush() can be rather long. It
> can be especially problematic as interrupt is disabled during the flush.
> It was reported that it might take seconds to complete in some extreme
> cases leading to hard lockup messages.
>
> As it is likely that not all the percpu blkg_iostat_set's has been
> updated since the last flush, those stale blkg_iostat_set's don't need
> to be flushed in this case. This patch optimizes blkcg_rstat_flush()
> by keeping a lockless list of recently updated blkg_iostat_set's in a
> newly added percpu blkcg->lhead pointer.
>
> The blkg_iostat_set is added to the lockless list on the update side
> in blk_cgroup_bio_start(). It is removed from the lockless list when
> flushed in blkcg_rstat_flush(). Due to racing, it is possible that
> blk_iostat_set's in the lockless list may have no new IO stats to be
> flushed. To protect against destruction of blkg, a percpu reference is
> gotten when putting into the lockless list and put back when removed.
>
> A blkg_iostat_set can determine if it is in a lockless list by checking
> the content of its lnode.next pointer which will be non-NULL when in
> a lockless list. This requires the presence of a special llist_last
> sentinel node to be put at the end of the lockless list.
>
> When booting up an instrumented test kernel with this patch on a
> 2-socket 96-thread system with cgroup v2, out of the 2051 calls to
> cgroup_rstat_flush() after bootup, 1788 of the calls were exited
> immediately because of empty lockless list. After an all-cpu kernel
> build, the ratio became 6295424/6340513. That was more than 99%.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Thanks,
Ming
next prev parent reply other threads:[~2022-06-06 3:16 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-02 19:20 [PATCH v6 0/3] blk-cgroup: Optimize blkcg_rstat_flush() Waiman Long
2022-06-02 19:20 ` Waiman Long
[not found] ` <20220602192020.166940-1-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2022-06-02 19:20 ` [PATCH v6 1/3] blk-cgroup: Correctly free percpu iostat_cpu in blkg on error exit Waiman Long
2022-06-02 19:20 ` Waiman Long
[not found] ` <20220602192020.166940-2-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2022-06-04 2:08 ` Ming Lei
2022-06-04 2:08 ` Ming Lei
2022-06-04 2:47 ` Waiman Long
2022-06-02 19:20 ` [PATCH v6 2/3] blk-cgroup: Return -ENOMEM directly in blkcg_css_alloc() error path Waiman Long
2022-06-02 19:20 ` Waiman Long
2022-06-02 20:39 ` Tejun Heo
[not found] ` <20220602192020.166940-3-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2022-06-04 2:16 ` Ming Lei
2022-06-04 2:16 ` Ming Lei
2022-06-02 19:20 ` [PATCH v6 3/3] blk-cgroup: Optimize blkcg_rstat_flush() Waiman Long
2022-06-04 3:58 ` Ming Lei
2022-06-05 23:15 ` Waiman Long
2022-06-06 1:39 ` Ming Lei
2022-06-06 1:59 ` Waiman Long
2022-06-06 1:59 ` Waiman Long
2022-06-06 2:23 ` Ming Lei
2022-06-06 2:58 ` Waiman Long
2022-06-06 2:58 ` Waiman Long
2022-06-06 3:15 ` Ming Lei
[not found] ` <20220602192020.166940-4-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2022-06-06 3:16 ` Ming Lei [this message]
2022-06-06 3:16 ` Ming Lei
2022-06-08 16:57 ` Michal Koutný
2022-06-08 16:57 ` Michal Koutný
2022-06-08 18:16 ` Waiman Long
2022-06-08 21:12 ` Michal Koutný
2022-06-08 22:14 ` Michal Koutný
2022-09-30 18:34 ` Waiman Long
2022-09-30 18:34 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yp1xjRyU9L5FiWXQ@T590 \
--to=ming.lei-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.