From: Andrea Righi <andrea.righi@canonical.com>
To: Josef Bacik <josef@toxicpanda.com>
Cc: Tejun Heo <tj@kernel.org>, Li Zefan <lizefan@huawei.com>,
Paolo Valente <paolo.valente@linaro.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Jens Axboe <axboe@kernel.dk>, Vivek Goyal <vgoyal@redhat.com>,
Dennis Zhou <dennis@kernel.org>,
cgroups@vger.kernel.org, linux-block@vger.kernel.org,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 1/3] blkcg: prevent priority inversion problem during sync()
Date: Fri, 8 Mar 2019 08:38:43 +0100 [thread overview]
Message-ID: <20190308073843.GA9732@xps-13> (raw)
In-Reply-To: <20190307221051.ruhpp73q6ek2at3d@macbook-pro-91.dhcp.thefacebook.com>
On Thu, Mar 07, 2019 at 05:10:53PM -0500, Josef Bacik wrote:
> On Thu, Mar 07, 2019 at 07:08:32PM +0100, Andrea Righi wrote:
> > Prevent priority inversion problem when a high-priority blkcg issues a
> > sync() and it is forced to wait the completion of all the writeback I/O
> > generated by any other low-priority blkcg, causing massive latencies to
> > processes that shouldn't be I/O-throttled at all.
> >
> > The idea is to save a list of blkcg's that are waiting for writeback:
> > every time a sync() is executed the current blkcg is added to the list.
> >
> > Then, when I/O is throttled, if there's a blkcg waiting for writeback
> > different than the current blkcg, no throttling is applied (we can
> > probably refine this logic later, i.e., a better policy could be to
> > adjust the throttling I/O rate using the blkcg with the highest speed
> > from the list of waiters - priority inheritance, kinda).
> >
> > Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
> > ---
> > block/blk-cgroup.c | 131 +++++++++++++++++++++++++++++++
> > block/blk-throttle.c | 11 ++-
> > fs/fs-writeback.c | 5 ++
> > fs/sync.c | 8 +-
> > include/linux/backing-dev-defs.h | 2 +
> > include/linux/blk-cgroup.h | 23 ++++++
> > mm/backing-dev.c | 2 +
> > 7 files changed, 178 insertions(+), 4 deletions(-)
> >
> > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> > index 2bed5725aa03..4305e78d1bb2 100644
> > --- a/block/blk-cgroup.c
> > +++ b/block/blk-cgroup.c
> > @@ -1351,6 +1351,137 @@ struct cgroup_subsys io_cgrp_subsys = {
> > };
> > EXPORT_SYMBOL_GPL(io_cgrp_subsys);
> >
> > +#ifdef CONFIG_CGROUP_WRITEBACK
> > +struct blkcg_wb_sleeper {
> > + struct backing_dev_info *bdi;
> > + struct blkcg *blkcg;
> > + refcount_t refcnt;
> > + struct list_head node;
> > +};
> > +
> > +static DEFINE_SPINLOCK(blkcg_wb_sleeper_lock);
> > +static LIST_HEAD(blkcg_wb_sleeper_list);
> > +
> > +static struct blkcg_wb_sleeper *
> > +blkcg_wb_sleeper_find(struct blkcg *blkcg, struct backing_dev_info *bdi)
> > +{
> > + struct blkcg_wb_sleeper *bws;
> > +
> > + list_for_each_entry(bws, &blkcg_wb_sleeper_list, node)
> > + if (bws->blkcg == blkcg && bws->bdi == bdi)
> > + return bws;
> > + return NULL;
> > +}
> > +
> > +static void blkcg_wb_sleeper_add(struct blkcg_wb_sleeper *bws)
> > +{
> > + list_add(&bws->node, &blkcg_wb_sleeper_list);
> > +}
> > +
> > +static void blkcg_wb_sleeper_del(struct blkcg_wb_sleeper *bws)
> > +{
> > + list_del_init(&bws->node);
> > +}
> > +
> > +/**
> > + * blkcg_wb_waiters_on_bdi - check for writeback waiters on a block device
> > + * @blkcg: current blkcg cgroup
> > + * @bdi: block device to check
> > + *
> > + * Return true if any other blkcg different than the current one is waiting for
> > + * writeback on the target block device, false otherwise.
> > + */
> > +bool blkcg_wb_waiters_on_bdi(struct blkcg *blkcg, struct backing_dev_info *bdi)
> > +{
> > + struct blkcg_wb_sleeper *bws;
> > + bool ret = false;
> > +
> > + spin_lock(&blkcg_wb_sleeper_lock);
> > + list_for_each_entry(bws, &blkcg_wb_sleeper_list, node)
> > + if (bws->bdi == bdi && bws->blkcg != blkcg) {
> > + ret = true;
> > + break;
> > + }
> > + spin_unlock(&blkcg_wb_sleeper_lock);
> > +
> > + return ret;
> > +}
>
> No global lock please, add something to the bdi I think? Also have a fast path
> of
OK, I'll add a list per-bdi and a lock as well.
>
> if (list_empty(blkcg_wb_sleeper_list))
> return false;
OK.
>
> we don't need to be super accurate here. Thanks,
>
> Josef
Thanks,
-Andrea
next prev parent reply other threads:[~2019-03-08 7:38 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-07 18:08 [PATCH v2 0/3] blkcg: sync() isolation Andrea Righi
2019-03-07 18:08 ` [PATCH v2 1/3] blkcg: prevent priority inversion problem during sync() Andrea Righi
2019-03-07 22:10 ` Josef Bacik
2019-03-08 7:38 ` Andrea Righi [this message]
2019-03-07 18:08 ` [PATCH v2 2/3] blkcg: introduce io.sync_isolation Andrea Righi
2019-03-07 22:11 ` Josef Bacik
2019-03-07 18:08 ` [PATCH v2 3/3] blkcg: implement sync() isolation Andrea Righi
2019-03-07 22:07 ` Josef Bacik
2019-03-08 7:39 ` Andrea Righi
2019-03-08 17:22 ` [PATCH v2 0/3] blkcg: " Josef Bacik
2019-03-08 17:32 ` Andrea Righi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190308073843.GA9732@xps-13 \
--to=andrea.righi@canonical.com \
--cc=axboe@kernel.dk \
--cc=cgroups@vger.kernel.org \
--cc=dennis@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=josef@toxicpanda.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizefan@huawei.com \
--cc=paolo.valente@linaro.org \
--cc=tj@kernel.org \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).