From: Weiping Zhang <zhangweiping@didiglobal.com>
To: <axboe@kernel.dk>, <tj@kernel.org>, <hch@lst.de>,
<bvanassche@acm.org>, <keith.busch@intel.com>,
<minwoo.im.dev@gmail.com>
Cc: <linux-block@vger.kernel.org>, <cgroups@vger.kernel.org>,
<linux-nvme@lists.infradead.org>
Subject: [PATCH v3 0/5] Add support Weighted Round Robin for blkcg and nvme
Date: Mon, 24 Jun 2019 22:28:32 +0800 [thread overview]
Message-ID: <cover.1561381826.git.zhangweiping@didiglobal.com> (raw)
Hi,
This series try to add Weighted Round Robin for block cgroup and nvme
driver. When multiple containers share a single nvme device, we want
to protect IO critical container from not be interfernced by other
containers. We add blkio.wrr interface to user to control their IO
priority. The blkio.wrr accept five level priorities, which contains
"urgent", "high", "medium", "low" and "none", the "none" is used for
disable WRR for this cgroup.
The first patch add an WRR infrastucture for block cgroup.
We add extra four hareware contexts at blk-mq layer,
HCTX_TYPE_WRR_URGETN/HIGH/MEDIUM/LOW to allow device driver maps
different hardsware queues to dirrenct hardware context.
The second patch add a nvme_ctrl_ops named get_ams to get the expect
Arbitration Mechanism Selected, now this series only support nvme-pci.
This operations will check both CAP.AMS and nvme-pci wrr queue count,
to decide enable WRR or RR.
The third patch rename write_queues module parameter to read_queues,
that can simplify the calculation the number of defaut,read,poll,wrr
queue.
The fourth patch skip the empty affinity set, because nvme may have
7 affinity sets, and some affinity set may be empty.
The last patch add support nvme-pci Weighted Round Robin with Urgent
Priority Class, we add four module paranmeters as follow:
wrr_urgent_queues
wrr_high_queues
wrr_medium_queues
wrr_low_queues
nvme-pci will set CC.AMS=001b, if CAP.AMS[17]=1 and wrr_xxx_queues
larger than 0. nvme driver will split hardware queues base on the
read/pool/wrr_xxx_queues, then set proper value for Queue Priority
(QPRIO) in DWORD11.
fio test:
CPU: Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
NVME: Intel SSDPE2KX020T8 P4510 2TB
[root@tmp-201812-d1802-818396173 low]# nvme show-regs /dev/nvme0n1
cap : 2078030fff
version : 10200
intms : 0
intmc : 0
cc : 460801
csts : 1
nssr : 0
aqa : 1f001f
asq : 5f7cc08000
acq : 5f5ac23000
cmbloc : 0
cmbsz : 0
Run fio-1, fio-2, fio-3 in parallel,
For RR(round robin) these three fio nearly get same iops or bps,
if we set blkio.wrr for different priority, the WRR "high" will
get more iops/bps than "medium" and "low".
RR:
fio-1: echo "259:0 none" > /sys/fs/cgroup/blkio/high/blkio.wrr
fio-2: echo "259:0 none" > /sys/fs/cgroup/blkio/medium/blkio.wrr
fio-3: echo "259:0 none" > /sys/fs/cgroup/blkio/low/blkio.wrr
WRR:
fio-1: echo "259:0 high" > /sys/fs/cgroup/blkio/high/blkio.wrr
fio-2: echo "259:0 medium" > /sys/fs/cgroup/blkio/medium/blkio.wrr
fio-3: echo "259:0 low" > /sys/fs/cgroup/blkio/low/blkio.wrr
rwtest=randread
fio --bs=4k --ioengine=libaio --iodepth=32 --filename=/dev/nvme0n1 --direct=1 --runtime=60 --numjobs=8 --rw=$rwtest --name=test$1 --group_reporting
Randread 4K RR WRR
-------------------------------------------------------
fio-1: 220 k 395 k
fio-2: 220 k 197 k
fio-3: 220 k 66 k
rwtest=randwrite
fio --bs=4k --ioengine=libaio --iodepth=32 --filename=/dev/nvme0n1 --direct=1 --runtime=60 --numjobs=8 --rw=$rwtest --name=test$1 --group_reporting
Randwrite 4K RR WRR
-------------------------------------------------------
fio-1: 150 k 295 k
fio-2: 150 k 148 k
fio-3: 150 k 51 k
rwtest=read
fio --bs=512k --ioengine=libaio --iodepth=32 --filename=/dev/nvme0n1 --direct=1 --runtime=60 --numjobs=8 --rw=$rwtest --name=test$1 --group_reporting
read 512K RR WRR
-------------------------------------------------------
fio-1: 963 MiB/s 1704 MiB/s
fio-2: 950 MiB/s 850 MiB/s
fio-3: 961 MiB/s 284 MiB/s
rwtest=read
fio --bs=512k --ioengine=libaio --iodepth=32 --filename=/dev/nvme0n1 --direct=1 --runtime=60 --numjobs=8 --rw=$rwtest --name=test$1 --group_reporting
write 512K RR WRR
-------------------------------------------------------
fio-1: 890 MiB/s 1150 MiB/s
fio-2: 871 MiB/s 595 MiB/s
fio-3: 895 MiB/s 188 MiB/s
Changes since V2:
* drop null_blk related patch, which adds a new NULL_Q_IRQ_WRR to
simulte nvme wrr policy
* add urgent tagset map for nvme driver
* fix some problem in V2, suggested by Minwoo
Changes since V1:
* reorder HCTX_TYPE_POLL to the last one to adopt nvme driver easily.
* add support WRR(Weighted Round Robin) for nvme driver
Weiping Zhang (5):
block: add weighted round robin for blkcgroup
nvme: add get_ams for nvme_ctrl_ops
nvme-pci: rename module parameter write_queues to read_queues
genirq/affinity: allow driver's discontigous affinity set
nvme: add support weighted round robin queue
block/blk-cgroup.c | 89 ++++++++++++++++
block/blk-mq-debugfs.c | 4 +
block/blk-mq-sched.c | 6 +-
block/blk-mq-tag.c | 4 +-
block/blk-mq-tag.h | 2 +-
block/blk-mq.c | 12 ++-
block/blk-mq.h | 20 +++-
block/blk.h | 2 +-
drivers/nvme/host/core.c | 9 +-
drivers/nvme/host/nvme.h | 2 +
drivers/nvme/host/pci.c | 246 ++++++++++++++++++++++++++++++++++++---------
include/linux/blk-cgroup.h | 2 +
include/linux/blk-mq.h | 14 +++
include/linux/interrupt.h | 2 +-
include/linux/nvme.h | 3 +
kernel/irq/affinity.c | 4 +
16 files changed, 362 insertions(+), 59 deletions(-)
--
2.14.1
next reply other threads:[~2019-06-24 14:48 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-24 14:28 Weiping Zhang [this message]
2019-07-18 13:33 ` [PATCH v3 0/5] Add support Weighted Round Robin for blkcg and nvme Weiping Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1561381826.git.zhangweiping@didiglobal.com \
--to=zhangweiping@didiglobal.com \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=cgroups@vger.kernel.org \
--cc=hch@lst.de \
--cc=keith.busch@intel.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=minwoo.im.dev@gmail.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).