From: Ming Lei <ming.lei@redhat.com>
To: John Garry <john.garry@huawei.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>,
linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
"Martin K. Petersen" <martin.petersen@oracle.com>,
Jens Axboe <axboe@kernel.dk>,
Douglas Gilbert <dgilbert@interlog.com>
Subject: Re: [bug report] shared tags causes IO hang and performance drop
Date: Thu, 15 Apr 2021 11:46:59 +0800 [thread overview]
Message-ID: <YHe3M62agQET6o6O@T590> (raw)
In-Reply-To: <87ceccf2-287b-9bd1-899a-f15026c9e65b@huawei.com>
On Wed, Apr 14, 2021 at 01:06:25PM +0100, John Garry wrote:
> On 14/04/2021 12:12, Ming Lei wrote:
> > On Wed, Apr 14, 2021 at 04:12:22PM +0530, Kashyap Desai wrote:
> > > > Hi Ming,
> > > >
> > > > > It is reported inside RH that CPU utilization is increased ~20% when
> > > > > running simple FIO test inside VM which disk is built on image stored
> > > > > on XFS/megaraid_sas.
> > > > >
> > > > > When I try to investigate by reproducing the issue via scsi_debug, I
> > > > > found IO hang when running randread IO(8k, direct IO, libaio) on
> > > > > scsi_debug disk created by the following command:
> > > > >
> > > > > modprobe scsi_debug host_max_queue=128
> > > > submit_queues=$NR_CPUS
> > > > > virtual_gb=256
> > > > >
> > > > So I can recreate this hang for using mq-deadline IO sched for scsi debug,
> > > > in
> > > > that fio does not exit. I'm using v5.12-rc7.
> > > I can also recreate this issue using mq-deadline. Using <none>, there is no
> > > IO hang issue.
> > > Also if I run script to change scheduler periodically (none, mq-deadline),
> > > sysfs entry hangs.
> > >
> > > Here is call trace-
> > > Call Trace:
> > > [ 1229.879862] __schedule+0x29d/0x7a0
> > > [ 1229.879871] schedule+0x3c/0xa0
> > > [ 1229.879875] blk_mq_freeze_queue_wait+0x62/0x90
> > > [ 1229.879880] ? finish_wait+0x80/0x80
> > > [ 1229.879884] elevator_switch+0x12/0x40
> > > [ 1229.879888] elv_iosched_store+0x79/0x120
> > > [ 1229.879892] ? kernfs_fop_write_iter+0xc7/0x1b0
> > > [ 1229.879897] queue_attr_store+0x42/0x70
> > > [ 1229.879901] kernfs_fop_write_iter+0x11f/0x1b0
> > > [ 1229.879905] new_sync_write+0x11f/0x1b0
> > > [ 1229.879912] vfs_write+0x184/0x250
> > > [ 1229.879915] ksys_write+0x59/0xd0
> > > [ 1229.879917] do_syscall_64+0x33/0x40
> > > [ 1229.879922] entry_SYSCALL_64_after_hwframe+0x44/0xae
> > >
> > >
> > > I tried both - 5.12.0-rc1 and 5.11.0-rc2+ and there is a same behavior.
> > > Let me also check megaraid_sas and see if anything generic or this is a
> > > special case of scsi_debug.
> > As I mentioned, it could be one generic issue wrt. SCHED_RESTART.
> > shared tags might have to restart all hctx since all share same tags.
>
> I tested on hisi_sas v2 hw (which now sets host_tagset), and can reproduce.
> Seems to be combination of mq-deadline and fio rw=randread settings required
> to reproduce from limited experiments.
>
> Incidentally, about the mq-deadline vs none IO scheduler on the same host, I
> get this with 6x SAS SSD:
>
> rw=read
> CPU util IOPs
> mq-deadline usr=26.80%, sys=52.78% 650K
> none usr=22.99%, sys=74.10% 475K
>
> rw=randread
> CPU util IOPs
> mq-deadline usr=21.72%, sys=44.18%, 423K
> none usr=23.15%, sys=74.01% 450K
Today I re-run the scsi_debug test on two server hardwares(32cores, dual
numa nodes), and the CPU utilization issue can be reproduced, follow
the test result:
1) randread test on ibm-x3850x6[*] with deadline
|IOPS | FIO CPU util
------------------------------------------------
hosttags | 94k | usr=1.13%, sys=14.75%
------------------------------------------------
non hosttags | 124k | usr=1.12%, sys=10.65%,
2) randread test on ibm-x3850x6[*] with none
|IOPS | FIO CPU util
------------------------------------------------
hosttags | 120k | usr=0.89%, sys=6.55%
------------------------------------------------
non hosttags | 121k | usr=1.07%, sys=7.35%
------------------------------------------------
*:
- that is the machine Yanhui reported VM cpu utilization increased by 20%
- kernel: latest linus tree(v5.12-rc7, commit: 7f75285ca57)
- also run same test on another 32cores machine, IOPS drop isn't
observed, but CPU utilization is increased obviously
3) test script
#/bin/bash
run_fio() {
RTIME=$1
JOBS=$2
DEVS=$3
BS=$4
QD=64
BATCH=16
fio --bs=$BS --ioengine=libaio \
--iodepth=$QD \
--iodepth_batch_submit=$BATCH \
--iodepth_batch_complete_min=$BATCH \
--filename=$DEVS \
--direct=1 --runtime=$RTIME --numjobs=$JOBS --rw=randread \
--name=test --group_reporting
}
SCHED=$1
NRQS=`getconf _NPROCESSORS_ONLN`
rmmod scsi_debug
modprobe scsi_debug host_max_queue=128 submit_queues=$NRQS virtual_gb=256
sleep 2
DEV=`lsscsi | grep scsi_debug | awk '{print $6}'`
echo $SCHED > /sys/block/`basename $DEV`/queue/scheduler
echo 128 > /sys/block/`basename $DEV`/device/queue_depth
run_fio 20 16 $DEV 8K
rmmod scsi_debug
modprobe scsi_debug max_queue=128 submit_queues=1 virtual_gb=256
sleep 2
DEV=`lsscsi | grep scsi_debug | awk '{print $6}'`
echo $SCHED > /sys/block/`basename $DEV`/queue/scheduler
echo 128 > /sys/block/`basename $DEV`/device/queue_depth
run_fio 20 16 $DEV 8k
Thanks,
Ming
next prev parent reply other threads:[~2021-04-15 3:47 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-14 7:50 [bug report] shared tags causes IO hang and performance drop Ming Lei
2021-04-14 10:10 ` John Garry
2021-04-14 10:38 ` Ming Lei
2021-04-14 10:42 ` Kashyap Desai
2021-04-14 11:12 ` Ming Lei
2021-04-14 12:06 ` John Garry
2021-04-15 3:46 ` Ming Lei [this message]
2021-04-15 10:41 ` John Garry
2021-04-15 12:18 ` Ming Lei
2021-04-15 15:41 ` John Garry
2021-04-16 0:46 ` Ming Lei
2021-04-16 8:29 ` John Garry
2021-04-16 8:39 ` Ming Lei
2021-04-16 14:59 ` John Garry
2021-04-20 3:06 ` Douglas Gilbert
2021-04-20 3:22 ` Bart Van Assche
2021-04-20 4:54 ` Douglas Gilbert
2021-04-20 6:52 ` Ming Lei
2021-04-20 20:22 ` Douglas Gilbert
2021-04-21 1:40 ` Ming Lei
2021-04-23 8:43 ` John Garry
2021-04-26 10:53 ` John Garry
2021-04-26 14:48 ` Ming Lei
2021-04-26 15:52 ` John Garry
2021-04-26 16:03 ` Ming Lei
2021-04-26 17:02 ` John Garry
2021-04-26 23:59 ` Ming Lei
2021-04-27 7:52 ` John Garry
2021-04-27 9:11 ` Ming Lei
2021-04-27 9:37 ` John Garry
2021-04-27 9:52 ` Ming Lei
2021-04-27 10:15 ` John Garry
2021-07-07 17:06 ` John Garry
2021-04-14 13:59 ` Kashyap Desai
2021-04-14 17:03 ` Douglas Gilbert
2021-04-14 18:19 ` John Garry
2021-04-14 19:39 ` Douglas Gilbert
2021-04-15 0:58 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YHe3M62agQET6o6O@T590 \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=dgilbert@interlog.com \
--cc=john.garry@huawei.com \
--cc=kashyap.desai@broadcom.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.