* [PATCH 0/3] zram: queue flag nowait and mior cleanup
@ 2023-05-12 8:29 Chaitanya Kulkarni
2023-05-12 8:29 ` [PATCH 1/3] zram: allow user to set QUEUE_FLAG_NOWAIT Chaitanya Kulkarni
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Chaitanya Kulkarni @ 2023-05-12 8:29 UTC (permalink / raw)
To: minchan, senozhatsky; +Cc: axboe, linux-block, Chaitanya Kulkarni
Hi Minchan/Sergey,
Consolidate the zram_bvec_read() and zram_bvec_write() into one
function, add missing flush_dcache_page() before zram_bvec_write(),
and allow user to set the QUEUE_FLAG_NOWAIT to get the better
performance with io_uring:-
* linux-block (for-next) # grep IOPS zram*fio | column -t
default-nowait-off-1.fio: read: IOPS=792k, BW=3095MiB/s
default-nowait-off-2.fio: read: IOPS=812k, BW=3173MiB/s
default-nowait-off-3.fio: read: IOPS=803k, BW=3137MiB/s
nowait-on-1.fio: read: IOPS=864k, BW=3374MiB/s
nowait-on-2.fio: read: IOPS=863k, BW=3371MiB/s
nowait-on-3.fio: read: IOPS=864k, BW=3374MiB/s
* linux-block (for-next) # grep cpu zram*fio | column -t
default-nowait-off-1.fio: cpu : usr=5.92%, sys=13.05%, ctx=37784503
default-nowait-off-2.fio: cpu : usr=6.07%, sys=13.66%, ctx=37821095
default-nowait-off-3.fio: cpu : usr=6.06%, sys=13.33%, ctx=38395321
nowait-on-1.fio: cpu : usr=1.78%, sys=97.58%, ctx=23848,
nowait-on-2.fio: cpu : usr=1.78%, sys=97.61%, ctx=22537,
nowait-on-3.fio: cpu : usr=1.80%, sys=97.60%, ctx=21582,
* linux-block (for-next) # grep slat zram*fio | column -t
default-nowait-off-1.fio: slat (nsec): min=410, max=4827.1k, avg=1868.3
default-nowait-off-2.fio: slat (nsec): min=411, max=5335.4k, avg=1953.2
default-nowait-off-3.fio: slat (nsec): min=420, max=5102.4k, avg=1852.9
nowait-on-1.fio: slat (nsec): min=1072,max=33725k, avg=54834.7
nowait-on-2.fio: slat (nsec): min=972, max=16048k, avg=54874.8
nowait-on-3.fio: slat (nsec): min=1042,max=7085.1k, avg=54823.7
Please let me know if further testing is needed I've ran fio
verification job in order to make verify these changes.
-ck
Chaitanya Kulkarni (3):
zram: allow user to set QUEUE_FLAG_NOWAIT
zram: consolidate zram_bio_read()_zram_bio_write()
zram: add flush_dcache_page() call for write
drivers/block/zram/zram_drv.c | 63 +++++++++++++++--------------------
1 file changed, 27 insertions(+), 36 deletions(-)
oinux-block (for-next) # sh test-zram-nowait.sh
+ git log -4
commit f6654c00ddea9b02952ac94551e0d7469fc8e8d8 (HEAD -> for-next)
Author: Chaitanya Kulkarni <kch@nvidia.com>
Date: Thu May 11 23:50:37 2023 -0700
zram: add flush_dcache_page() call for write
Just like we have a call flush_dcache_read() after zrma_bvec_read, add
missing flush_dcache_page() call before zram_bdev_write() in order to
handle the cache congruency of the kernel and userspace mappings of
page when zram does write.
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
commit bfe3e67f81284663bf84bc5121c27bf90410d433
Author: Chaitanya Kulkarni <kch@nvidia.com>
Date: Thu May 11 23:46:43 2023 -0700
zram: consolidate zram_bio_read()_zram_bio_write()
zram_bio_read() and zram_bio_write() are 26 lines rach and share most of
the code except call to zram_bdev_read() and zram_bvec_write() calls.
Consolidate code into single zram_bio_rw() to remove the duplicate code
and an extra function that is only needed for 2 lines of code :-
1c1
< static void zram_bio_read(struct zram *zram, struct bio *bio)
commit 6f5b4f6dec19464158ead387739ceb72b97e991c
Author: Chaitanya Kulkarni <kch@nvidia.com>
Date: Thu May 11 23:03:32 2023 -0700
zram: allow user to set QUEUE_FLAG_NOWAIT
Allow user to set the QUEUE_FLAG_NOWAIT optionally using module
parameter to retain the default behaviour. Also, update respective
allocation flags in the write path. Following are the performance
numbers with io_uring fio engine for random read, note that device has
been populated fully with randwrite workload before taking these
numbers :-
* linux-block (for-next) # grep IOPS zram*fio | column -t
default-nowait-off-1.fio: read: IOPS=792k, BW=3095MiB/s
default-nowait-off-2.fio: read: IOPS=812k, BW=3173MiB/s
default-nowait-off-3.fio: read: IOPS=803k, BW=3137MiB/s
nowait-on-1.fio: read: IOPS=864k, BW=3374MiB/s
nowait-on-2.fio: read: IOPS=863k, BW=3371MiB/s
nowait-on-3.fio: read: IOPS=864k, BW=3374MiB/s
* linux-block (for-next) # grep cpu zram*fio | column -t
default-nowait-off-1.fio: cpu : usr=5.92%, sys=13.05%, ctx=37784503
default-nowait-off-2.fio: cpu : usr=6.07%, sys=13.66%, ctx=37821095
default-nowait-off-3.fio: cpu : usr=6.06%, sys=13.33%, ctx=38395321
nowait-on-1.fio: cpu : usr=1.78%, sys=97.58%, ctx=23848,
nowait-on-2.fio: cpu : usr=1.78%, sys=97.61%, ctx=22537,
nowait-on-3.fio: cpu : usr=1.80%, sys=97.60%, ctx=21582,
* linux-block (for-next) # grep slat zram*fio | column -t
default-nowait-off-1.fio: slat (nsec): min=410, max=4827.1k, avg=1868.3
default-nowait-off-2.fio: slat (nsec): min=411, max=5335.4k, avg=1953.2
default-nowait-off-3.fio: slat (nsec): min=420, max=5102.4k, avg=1852.9
nowait-on-1.fio: slat (nsec): min=1072,max=33725k, avg=54834.7
nowait-on-2.fio: slat (nsec): min=972, max=16048k, avg=54874.8
nowait-on-3.fio: slat (nsec): min=1042,max=7085.1k, avg=54823.7
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
commit 6622cc4e0c7139d5173ae5f115a7d87e45a51c1a
Merge: 2742538c4d28 7387653a6022
Author: Chaitanya Kulkarni <kch@nvidia.com>
Date: Thu May 11 22:57:28 2023 -0700
Merge branch 'for-next' of git://git.kernel.dk/linux-block into for-next
+ ./compile_zram.sh
+ umount /mnt/zram
umount: /mnt/zram: no mount point specified.
+ dmesg -c
+ modprobe -r zram
+ lsmod
+ grep zram
++ nproc
+ make -j 48 M=drivers/block modules
+ HOST=drivers/block/zram/zram.ko
++ uname -r
+ HOST_DEST=/lib/modules/6.4.0-rc1lblk+/kernel/drivers/block/zram/
+ cp drivers/block/zram/zram.ko /lib/modules/6.4.0-rc1lblk+/kernel/drivers/block/zram//
+ ls -lrth /lib/modules/6.4.0-rc1lblk+/kernel/drivers/block/zram//zram.ko
-rw-r--r--. 1 root root 676K May 12 00:23 /lib/modules/6.4.0-rc1lblk+/kernel/drivers/block/zram//zram.ko
+ dmesg -c
+ lsmod
+ grep zram
+ modprobe zram
+ sleep 1
+ zramctl -s 20GB /dev/zram0
+ sleep 1
+ test_nowait default-nowait-off
+ fio fio/verify.fio --ioengine=io_uring --size=2G --filename=/dev/zram0
write-and-verify: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=16
fio-3.34
Starting 1 process
Jobs: 1 (f=1): [V(1)][100.0%][r=995MiB/s][r=255k IOPS][eta 00m:00s]
write-and-verify: (groupid=0, jobs=1): err= 0: pid=176756: Fri May 12 00:23:34 2023
read: IOPS=255k, BW=995MiB/s (1043MB/s)(1295MiB/1302msec)
slat (nsec): min=922, max=37201, avg=2879.25, stdev=1196.56
clat (usec): min=3, max=134, avg=58.98, stdev= 5.43
lat (usec): min=6, max=137, avg=61.85, stdev= 5.61
clat percentiles (usec):
| 1.00th=[ 48], 5.00th=[ 51], 10.00th=[ 53], 20.00th=[ 55],
| 30.00th=[ 57], 40.00th=[ 58], 50.00th=[ 59], 60.00th=[ 60],
| 70.00th=[ 62], 80.00th=[ 63], 90.00th=[ 67], 95.00th=[ 69],
| 99.00th=[ 76], 99.50th=[ 78], 99.90th=[ 86], 99.95th=[ 92],
| 99.99th=[ 103]
write: IOPS=201k, BW=785MiB/s (823MB/s)(2048MiB/2609msec); 0 zone resets
slat (nsec): min=1132, max=148451, avg=4590.62, stdev=2151.57
clat (usec): min=19, max=250, avg=74.77, stdev=11.44
lat (usec): min=23, max=338, avg=79.36, stdev=11.92
clat percentiles (usec):
| 1.00th=[ 54], 5.00th=[ 61], 10.00th=[ 65], 20.00th=[ 69],
| 30.00th=[ 71], 40.00th=[ 73], 50.00th=[ 75], 60.00th=[ 76],
| 70.00th=[ 78], 80.00th=[ 81], 90.00th=[ 86], 95.00th=[ 91],
| 99.00th=[ 116], 99.50th=[ 145], 99.90th=[ 165], 99.95th=[ 176],
| 99.99th=[ 204]
bw ( KiB/s): min=161184, max=886504, per=86.97%, avg=699050.67, stdev=266603.81, samples=6
iops : min=40296, max=221626, avg=174762.67, stdev=66650.95, samples=6
lat (usec) : 4=0.01%, 10=0.01%, 20=0.01%, 50=1.57%, 100=97.06%
lat (usec) : 250=1.37%, 500=0.01%
cpu : usr=42.84%, sys=56.55%, ctx=4056, majf=0, minf=9079
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=331621,524288,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
Run status group 0 (all jobs):
READ: bw=995MiB/s (1043MB/s), 995MiB/s-995MiB/s (1043MB/s-1043MB/s), io=1295MiB (1358MB), run=1302-1302msec
WRITE: bw=785MiB/s (823MB/s), 785MiB/s-785MiB/s (823MB/s-823MB/s), io=2048MiB (2147MB), run=2609-2609msec
Disk stats (read/write):
zram0: ios=327141/524288, merge=0/0, ticks=484/1980, in_queue=2464, util=97.57%
+ fio fio/randwrite.fio --ioengine=io_uring --size=20G --filename=/dev/zram0
RANDWRITE: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=2
...
fio-3.34
Starting 48 processes
Jobs: 48 (f=48): [w(48)][100.0%][w=714MiB/s][w=183k IOPS][eta 00m:00s]
RANDWRITE: (groupid=0, jobs=48): err= 0: pid=176785: Fri May 12 00:24:34 2023
write: IOPS=214k, BW=836MiB/s (877MB/s)(49.0GiB/60002msec); 0 zone resets
slat (nsec): min=381, max=1706.8k, avg=2462.04, stdev=2710.88
clat (nsec): min=221, max=19317k, avg=445142.67, stdev=614360.19
lat (usec): min=8, max=19319, avg=447.60, stdev=614.51
clat percentiles (usec):
| 1.00th=[ 21], 5.00th=[ 26], 10.00th=[ 31], 20.00th=[ 43],
| 30.00th=[ 55], 40.00th=[ 74], 50.00th=[ 251], 60.00th=[ 465],
| 70.00th=[ 515], 80.00th=[ 725], 90.00th=[ 1020], 95.00th=[ 1418],
| 99.00th=[ 3228], 99.50th=[ 4015], 99.90th=[ 5407], 99.95th=[ 6063],
| 99.99th=[ 7570]
bw ( KiB/s): min=479679, max=1759836, per=100.00%, avg=857832.81, stdev=5081.52, samples=5712
iops : min=119913, max=439954, avg=214451.85, stdev=1270.39, samples=5712
lat (nsec) : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.86%, 50=25.67%
lat (usec) : 100=18.97%, 250=4.47%, 500=17.37%, 750=12.90%, 1000=8.96%
lat (msec) : 2=8.51%, 4=1.77%, 10=0.50%, 20=0.01%
cpu : usr=5.18%, sys=9.25%, ctx=9890182, majf=0, minf=579
IO depths : 1=0.1%, 2=100.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,12840341,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=2
Run status group 0 (all jobs):
WRITE: bw=836MiB/s (877MB/s), 836MiB/s-836MiB/s (877MB/s-877MB/s), io=49.0GiB (52.6GB), run=60002-60002msec
Disk stats (read/write):
zram0: ios=0/12822005, merge=0/0, ticks=0/2558435, in_queue=2558435, util=99.96%
+ for i in 1 2 3
+ fio fio/randread.fio --ioengine=io_uring --size=20G --filename=/dev/zram0 --output=zram-default-nowait-off-1.fio
+ for i in 1 2 3 [r(48)][100.0%][r=3138MiB/s][r=803k IOPS][eta 00m:00s]
+ fio fio/randread.fio --ioengine=io_uring --size=20G --filename=/dev/zram0 --output=zram-default-nowait-off-2.fio
+ for i in 1 2 3 [r(48)][100.0%][r=3027MiB/s][r=775k IOPS][eta 00m:00s]
+ fio fio/randread.fio --ioengine=io_uring --size=20G --filename=/dev/zram0 --output=zram-default-nowait-off-3.fio
+ modprobe -r zramr(48)][100.0%][r=3074MiB/s][r=787k IOPS][eta 00m:00s]
+ modprobe zram nowait=1
+ sleep 1
+ zramctl -s 20GB /dev/zram0
+ sleep 1
+ test_nowait nowait-on
+ fio fio/verify.fio --ioengine=io_uring --size=2G --filename=/dev/zram0
write-and-verify: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=16
fio-3.34
Starting 1 process
Jobs: 1 (f=1): [V(1)][80.0%][r=1192MiB/s,w=124MiB/s][r=305k,w=31.8k IOPS][eta 00m:01s]
write-and-verify: (groupid=0, jobs=1): err= 0: pid=177580: Fri May 12 00:27:46 2023
read: IOPS=386k, BW=1507MiB/s (1581MB/s)(1293MiB/858msec)
slat (nsec): min=1242, max=44093, avg=1777.71, stdev=460.23
clat (nsec): min=721, max=92537, avg=38932.85, stdev=2677.45
lat (nsec): min=2494, max=95692, avg=40710.56, stdev=2756.19
clat percentiles (nsec):
| 1.00th=[36608], 5.00th=[37120], 10.00th=[37632], 20.00th=[37632],
| 30.00th=[38144], 40.00th=[38144], 50.00th=[38144], 60.00th=[38656],
| 70.00th=[38656], 80.00th=[39168], 90.00th=[39680], 95.00th=[44800],
| 99.00th=[49920], 99.50th=[54016], 99.90th=[64256], 99.95th=[65280],
| 99.99th=[79360]
write: IOPS=169k, BW=660MiB/s (692MB/s)(2048MiB/3104msec); 0 zone resets
slat (usec): min=3, max=1097, avg= 5.59, stdev= 2.13
clat (nsec): min=601, max=1389.5k, avg=88874.27, stdev=12749.86
lat (usec): min=5, max=1402, avg=94.46, stdev=13.41
clat percentiles (usec):
| 1.00th=[ 69], 5.00th=[ 73], 10.00th=[ 76], 20.00th=[ 80],
| 30.00th=[ 84], 40.00th=[ 87], 50.00th=[ 90], 60.00th=[ 92],
| 70.00th=[ 95], 80.00th=[ 97], 90.00th=[ 101], 95.00th=[ 104],
| 99.00th=[ 116], 99.50th=[ 128], 99.90th=[ 167], 99.95th=[ 186],
| 99.99th=[ 241]
bw ( KiB/s): min=122368, max=788104, per=88.69%, avg=599186.29, stdev=219358.99, samples=7
iops : min=30592, max=197026, avg=149796.57, stdev=54839.75, samples=7
lat (nsec) : 750=0.01%
lat (usec) : 4=0.01%, 10=0.01%, 20=0.01%, 50=38.32%, 100=55.08%
lat (usec) : 250=6.60%, 500=0.01%
lat (msec) : 2=0.01%
cpu : usr=35.09%, sys=64.76%, ctx=6, majf=0, minf=9067
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=331086,524288,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
Run status group 0 (all jobs):
READ: bw=1507MiB/s (1581MB/s), 1507MiB/s-1507MiB/s (1581MB/s-1581MB/s), io=1293MiB (1356MB), run=858-858msec
WRITE: bw=660MiB/s (692MB/s), 660MiB/s-660MiB/s (692MB/s-692MB/s), io=2048MiB (2147MB), run=3104-3104msec
Disk stats (read/write):
zram0: ios=304865/524288, merge=0/0, ticks=262/1514, in_queue=1776, util=97.57%
+ fio fio/randwrite.fio --ioengine=io_uring --size=20G --filename=/dev/zram0
RANDWRITE: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=2
...
fio-3.34
Starting 48 processes
Jobs: 48 (f=48): [w(48)][100.0%][w=759MiB/s][w=194k IOPS][eta 00m:00s]
RANDWRITE: (groupid=0, jobs=48): err= 0: pid=177583: Fri May 12 00:28:47 2023
write: IOPS=227k, BW=887MiB/s (931MB/s)(52.0GiB/60001msec); 0 zone resets
slat (usec): min=6, max=26750, avg=210.18, stdev=316.16
clat (nsec): min=681, max=26752k, avg=211628.33, stdev=316425.45
lat (usec): min=18, max=27199, avg=421.81, stdev=457.77
clat percentiles (usec):
| 1.00th=[ 16], 5.00th=[ 18], 10.00th=[ 19], 20.00th=[ 21],
| 30.00th=[ 23], 40.00th=[ 26], 50.00th=[ 29], 60.00th=[ 36],
| 70.00th=[ 457], 80.00th=[ 506], 90.00th=[ 545], 95.00th=[ 594],
| 99.00th=[ 1074], 99.50th=[ 1254], 99.90th=[ 3425], 99.95th=[ 3687],
| 99.99th=[ 4686]
bw ( KiB/s): min=710189, max=1653080, per=100.00%, avg=910501.29, stdev=3653.62, samples=5712
iops : min=177535, max=413270, avg=227617.18, stdev=913.41, samples=5712
lat (nsec) : 750=0.01%, 1000=0.01%
lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=16.56%, 50=49.44%
lat (usec) : 100=0.93%, 250=0.06%, 500=11.29%, 750=18.02%, 1000=1.51%
lat (msec) : 2=1.91%, 4=0.24%, 10=0.03%, 20=0.01%, 50=0.01%
cpu : usr=0.69%, sys=98.61%, ctx=26601, majf=0, minf=582
IO depths : 1=0.1%, 2=100.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,13631002,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=2
Run status group 0 (all jobs):
WRITE: bw=887MiB/s (931MB/s), 887MiB/s-887MiB/s (931MB/s-931MB/s), io=52.0GiB (55.8GB), run=60001-60001msec
Disk stats (read/write):
zram0: ios=0/13604021, merge=0/0, ticks=0/2807107, in_queue=2807107, util=99.95%
+ for i in 1 2 3
+ fio fio/randread.fio --ioengine=io_uring --size=20G --filename=/dev/zram0 --output=zram-nowait-on-1.fio
+ for i in 1 2 3 [r(48)][100.0%][r=3378MiB/s][r=865k IOPS][eta 00m:00s]
+ fio fio/randread.fio --ioengine=io_uring --size=20G --filename=/dev/zram0 --output=zram-nowait-on-2.fio
+ for i in 1 2 3 [r(48)][100.0%][r=3357MiB/s][r=859k IOPS][eta 00m:00s]
+ fio fio/randread.fio --ioengine=io_uring --size=20G --filename=/dev/zram0 --output=zram-nowait-on-3.fio
+ modprobe -r zramr(48)][100.0%][r=3357MiB/s][r=859k IOPS][eta 00m:00s]
linux-block (for-next) # for i in IOPS cpu; do grep $i zram*fio | column -t ; done
zram-default-nowait-off-1.fio: read: IOPS=802k, BW=3133MiB/s (3285MB/s)(184GiB/60001msec)
zram-default-nowait-off-2.fio: read: IOPS=796k, BW=3111MiB/s (3262MB/s)(182GiB/60002msec)
zram-default-nowait-off-3.fio: read: IOPS=796k, BW=3108MiB/s (3259MB/s)(182GiB/60002msec)
zram-nowait-on-1.fio: read: IOPS=857k, BW=3346MiB/s (3509MB/s)(196GiB/60001msec)
zram-nowait-on-2.fio: read: IOPS=857k, BW=3347MiB/s (3509MB/s)(196GiB/60001msec)
zram-nowait-on-3.fio: read: IOPS=858k, BW=3353MiB/s (3516MB/s)(196GiB/60001msec)
zram-default-nowait-off-1.fio: cpu : usr=5.82%, sys=13.54%, ctx=36301915, majf=0, minf=687
zram-default-nowait-off-2.fio: cpu : usr=5.84%, sys=13.03%, ctx=37781937, majf=0, minf=640
zram-default-nowait-off-3.fio: cpu : usr=5.84%, sys=12.90%, ctx=37492533, majf=0, minf=688
zram-nowait-on-1.fio: cpu : usr=1.74%, sys=97.62%, ctx=24068, majf=0, minf=783
zram-nowait-on-2.fio: cpu : usr=1.74%, sys=97.57%, ctx=24674, majf=0, minf=763
zram-nowait-on-3.fio: cpu : usr=1.76%, sys=97.59%, ctx=24725, majf=0, minf=723
--
2.40.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/3] zram: allow user to set QUEUE_FLAG_NOWAIT
2023-05-12 8:29 [PATCH 0/3] zram: queue flag nowait and mior cleanup Chaitanya Kulkarni
@ 2023-05-12 8:29 ` Chaitanya Kulkarni
2023-05-12 14:31 ` Christoph Hellwig
2023-05-12 8:29 ` [PATCH 2/3] zram: consolidate zram_bio_read()_zram_bio_write() Chaitanya Kulkarni
2023-05-12 8:29 ` [PATCH 3/3] zram: add flush_dcache_page() call for write Chaitanya Kulkarni
2 siblings, 1 reply; 17+ messages in thread
From: Chaitanya Kulkarni @ 2023-05-12 8:29 UTC (permalink / raw)
To: minchan, senozhatsky; +Cc: axboe, linux-block, Chaitanya Kulkarni
Allow user to set the QUEUE_FLAG_NOWAIT optionally using module
parameter to retain the default behaviour. Also, update respective
allocation flags in the write path. Following are the performance
numbers with io_uring fio engine for random read, note that device has
been populated fully with randwrite workload before taking these
numbers :-
* linux-block (for-next) # grep IOPS zram*fio | column -t
default-nowait-off-1.fio: read: IOPS=802k, BW=3133MiB/s
default-nowait-off-2.fio: read: IOPS=796k, BW=3111MiB/s
default-nowait-off-3.fio: read: IOPS=796k, BW=3108MiB/s
nowait-on-1.fio: read: IOPS=857k, BW=3346MiB/s
nowait-on-2.fio: read: IOPS=857k, BW=3347MiB/s
nowait-on-3.fio: read: IOPS=858k, BW=3353MiB/s
* linux-block (for-next) # grep cpu zram*fio | column -t
default-nowait-off-1.fio: cpu : usr=5.82%, sys=13.54%, ctx=36301915
default-nowait-off-2.fio: cpu : usr=5.84%, sys=13.03%, ctx=37781937
default-nowait-off-3.fio: cpu : usr=5.84%, sys=12.90%, ctx=37492533
nowait-on-1.fio: cpu : usr=1.74%, sys=97.62%, ctx=24068,
nowait-on-2.fio: cpu : usr=1.74%, sys=97.57%, ctx=24674,
nowait-on-3.fio: cpu : usr=1.76%, sys=97.59%, ctx=24725,
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
---
drivers/block/zram/zram_drv.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index f6d90f1ba5cf..b2e419f15f71 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -36,6 +36,10 @@
#include "zram_drv.h"
+static bool g_nowait;
+module_param_named(nowait, g_nowait, bool, 0444);
+MODULE_PARM_DESC(nowait, "set QUEUE_FLAG_NOWAIT. Default: False");
+
static DEFINE_IDR(zram_index_idr);
/* idr index must be protected */
static DEFINE_MUTEX(zram_index_mutex);
@@ -1540,9 +1544,10 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
static int zram_bvec_write_partial(struct zram *zram, struct bio_vec *bvec,
u32 index, int offset, struct bio *bio)
{
- struct page *page = alloc_page(GFP_NOIO);
+ struct page *page;
int ret;
+ page = alloc_page(bio->bi_opf & REQ_NOWAIT ? GFP_NOWAIT : GFP_NOIO);
if (!page)
return -ENOMEM;
@@ -2215,6 +2220,8 @@ static int zram_add(void)
/* zram devices sort of resembles non-rotational disks */
blk_queue_flag_set(QUEUE_FLAG_NONROT, zram->disk->queue);
blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, zram->disk->queue);
+ if (g_nowait)
+ blk_queue_flag_set(QUEUE_FLAG_NOWAIT, zram->disk->queue);
/*
* To ensure that we always get PAGE_SIZE aligned
--
2.40.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 2/3] zram: consolidate zram_bio_read()_zram_bio_write()
2023-05-12 8:29 [PATCH 0/3] zram: queue flag nowait and mior cleanup Chaitanya Kulkarni
2023-05-12 8:29 ` [PATCH 1/3] zram: allow user to set QUEUE_FLAG_NOWAIT Chaitanya Kulkarni
@ 2023-05-12 8:29 ` Chaitanya Kulkarni
2023-05-12 14:32 ` Christoph Hellwig
2023-05-12 8:29 ` [PATCH 3/3] zram: add flush_dcache_page() call for write Chaitanya Kulkarni
2 siblings, 1 reply; 17+ messages in thread
From: Chaitanya Kulkarni @ 2023-05-12 8:29 UTC (permalink / raw)
To: minchan, senozhatsky; +Cc: axboe, linux-block, Chaitanya Kulkarni
zram_bio_read() and zram_bio_write() are 26 lines rach and share most of
the code except call to zram_bdev_read() and zram_bvec_write() calls.
Consolidate code into single zram_bio_rw() to remove the duplicate code
and an extra function that is only needed for 2 lines of code :-
1c1
< static void zram_bio_read(struct zram *zram, struct bio *bio)
---
> static void zram_bio_write(struct zram *zram, struct bio *bio)
13,14c13,14
< if (zram_bvec_read(zram, &bv, index, offset, bio) < 0) {
< atomic64_inc(&zram->stats.failed_reads);
---
> if (zram_bvec_write(zram, &bv, index, offset, bio) < 0) {
> atomic64_inc(&zram->stats.failed_writes);
18d17
< flush_dcache_page(bv.bv_page);
diff stats with this patch :-
drivers/block/zram/zram_drv.c | 53 ++++++++++++-----------------------
1 file changed, 18 insertions(+), 35 deletions(-)
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
---
drivers/block/zram/zram_drv.c | 53 ++++++++++++-----------------------
1 file changed, 18 insertions(+), 35 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index b2e419f15f71..fc37419b3735 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1873,38 +1873,12 @@ static void zram_bio_discard(struct zram *zram, struct bio *bio)
bio_endio(bio);
}
-static void zram_bio_read(struct zram *zram, struct bio *bio)
-{
- struct bvec_iter iter;
- struct bio_vec bv;
- unsigned long start_time;
-
- start_time = bio_start_io_acct(bio);
- bio_for_each_segment(bv, bio, iter) {
- u32 index = iter.bi_sector >> SECTORS_PER_PAGE_SHIFT;
- u32 offset = (iter.bi_sector & (SECTORS_PER_PAGE - 1)) <<
- SECTOR_SHIFT;
-
- if (zram_bvec_read(zram, &bv, index, offset, bio) < 0) {
- atomic64_inc(&zram->stats.failed_reads);
- bio->bi_status = BLK_STS_IOERR;
- break;
- }
- flush_dcache_page(bv.bv_page);
-
- zram_slot_lock(zram, index);
- zram_accessed(zram, index);
- zram_slot_unlock(zram, index);
- }
- bio_end_io_acct(bio, start_time);
- bio_endio(bio);
-}
-
-static void zram_bio_write(struct zram *zram, struct bio *bio)
+static void zram_bio_rw(struct zram *zram, struct bio *bio)
{
struct bvec_iter iter;
struct bio_vec bv;
unsigned long start_time;
+ int ret;
start_time = bio_start_io_acct(bio);
bio_for_each_segment(bv, bio, iter) {
@@ -1912,10 +1886,21 @@ static void zram_bio_write(struct zram *zram, struct bio *bio)
u32 offset = (iter.bi_sector & (SECTORS_PER_PAGE - 1)) <<
SECTOR_SHIFT;
- if (zram_bvec_write(zram, &bv, index, offset, bio) < 0) {
- atomic64_inc(&zram->stats.failed_writes);
- bio->bi_status = BLK_STS_IOERR;
- break;
+ if (op_is_write(bio_op(bio))) {
+ ret = zram_bvec_write(zram, &bv, index, offset, bio);
+ if (ret < 0) {
+ atomic64_inc(&zram->stats.failed_writes);
+ bio->bi_status = BLK_STS_IOERR;
+ break;
+ }
+ } else {
+ ret = zram_bvec_read(zram, &bv, index, offset, bio);
+ if (ret < 0) {
+ atomic64_inc(&zram->stats.failed_reads);
+ bio->bi_status = BLK_STS_IOERR;
+ break;
+ }
+ flush_dcache_page(bv.bv_page);
}
zram_slot_lock(zram, index);
@@ -1935,10 +1920,8 @@ static void zram_submit_bio(struct bio *bio)
switch (bio_op(bio)) {
case REQ_OP_READ:
- zram_bio_read(zram, bio);
- break;
case REQ_OP_WRITE:
- zram_bio_write(zram, bio);
+ zram_bio_rw(zram, bio);
break;
case REQ_OP_DISCARD:
case REQ_OP_WRITE_ZEROES:
--
2.40.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 3/3] zram: add flush_dcache_page() call for write
2023-05-12 8:29 [PATCH 0/3] zram: queue flag nowait and mior cleanup Chaitanya Kulkarni
2023-05-12 8:29 ` [PATCH 1/3] zram: allow user to set QUEUE_FLAG_NOWAIT Chaitanya Kulkarni
2023-05-12 8:29 ` [PATCH 2/3] zram: consolidate zram_bio_read()_zram_bio_write() Chaitanya Kulkarni
@ 2023-05-12 8:29 ` Chaitanya Kulkarni
2023-05-12 14:34 ` Christoph Hellwig
2 siblings, 1 reply; 17+ messages in thread
From: Chaitanya Kulkarni @ 2023-05-12 8:29 UTC (permalink / raw)
To: minchan, senozhatsky; +Cc: axboe, linux-block, Chaitanya Kulkarni
Just like we have a call flush_dcache_read() after zrma_bvec_read(), add
missing flush_dcache_page() call before zram_bdev_write() in order to
handle the cache congruency of the kernel and userspace mappings of
page for REQ_OP_WRITE handling.
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
---
drivers/block/zram/zram_drv.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index fc37419b3735..a7954ae80d40 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1887,6 +1887,7 @@ static void zram_bio_rw(struct zram *zram, struct bio *bio)
SECTOR_SHIFT;
if (op_is_write(bio_op(bio))) {
+ flush_dcache_page(bv.bv_page);
ret = zram_bvec_write(zram, &bv, index, offset, bio);
if (ret < 0) {
atomic64_inc(&zram->stats.failed_writes);
--
2.40.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 1/3] zram: allow user to set QUEUE_FLAG_NOWAIT
2023-05-12 8:29 ` [PATCH 1/3] zram: allow user to set QUEUE_FLAG_NOWAIT Chaitanya Kulkarni
@ 2023-05-12 14:31 ` Christoph Hellwig
2023-05-12 14:34 ` Jens Axboe
2023-05-13 1:05 ` Chaitanya Kulkarni
0 siblings, 2 replies; 17+ messages in thread
From: Christoph Hellwig @ 2023-05-12 14:31 UTC (permalink / raw)
To: Chaitanya Kulkarni; +Cc: minchan, senozhatsky, axboe, linux-block
On Fri, May 12, 2023 at 01:29:56AM -0700, Chaitanya Kulkarni wrote:
> Allow user to set the QUEUE_FLAG_NOWAIT optionally using module
> parameter to retain the default behaviour. Also, update respective
> allocation flags in the write path. Following are the performance
> numbers with io_uring fio engine for random read, note that device has
> been populated fully with randwrite workload before taking these
> numbers :-
Why would you add a module option, except to make everyones life hell?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/3] zram: consolidate zram_bio_read()_zram_bio_write()
2023-05-12 8:29 ` [PATCH 2/3] zram: consolidate zram_bio_read()_zram_bio_write() Chaitanya Kulkarni
@ 2023-05-12 14:32 ` Christoph Hellwig
2023-05-13 1:06 ` Chaitanya Kulkarni
0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2023-05-12 14:32 UTC (permalink / raw)
To: Chaitanya Kulkarni; +Cc: minchan, senozhatsky, axboe, linux-block
On Fri, May 12, 2023 at 01:29:57AM -0700, Chaitanya Kulkarni wrote:
> zram_bio_read() and zram_bio_write() are 26 lines rach and share most of
> the code except call to zram_bdev_read() and zram_bvec_write() calls.
> Consolidate code into single zram_bio_rw() to remove the duplicate code
> and an extra function that is only needed for 2 lines of code :-
And it adds an extra conditional to every page in the fast path..
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/3] zram: add flush_dcache_page() call for write
2023-05-12 8:29 ` [PATCH 3/3] zram: add flush_dcache_page() call for write Chaitanya Kulkarni
@ 2023-05-12 14:34 ` Christoph Hellwig
0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2023-05-12 14:34 UTC (permalink / raw)
To: Chaitanya Kulkarni; +Cc: minchan, senozhatsky, axboe, linux-block
On Fri, May 12, 2023 at 01:29:58AM -0700, Chaitanya Kulkarni wrote:
> Just like we have a call flush_dcache_read() after zrma_bvec_read(), add
> missing flush_dcache_page() call before zram_bdev_write() in order to
> handle the cache congruency of the kernel and userspace mappings of
> page for REQ_OP_WRITE handling.
And why do you think this should be required?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/3] zram: allow user to set QUEUE_FLAG_NOWAIT
2023-05-12 14:31 ` Christoph Hellwig
@ 2023-05-12 14:34 ` Jens Axboe
2023-05-13 1:06 ` Chaitanya Kulkarni
2023-05-13 1:05 ` Chaitanya Kulkarni
1 sibling, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2023-05-12 14:34 UTC (permalink / raw)
To: Christoph Hellwig, Chaitanya Kulkarni; +Cc: minchan, senozhatsky, linux-block
On 5/12/23 8:31 AM, Christoph Hellwig wrote:
> On Fri, May 12, 2023 at 01:29:56AM -0700, Chaitanya Kulkarni wrote:
>> Allow user to set the QUEUE_FLAG_NOWAIT optionally using module
>> parameter to retain the default behaviour. Also, update respective
>> allocation flags in the write path. Following are the performance
>> numbers with io_uring fio engine for random read, note that device has
>> been populated fully with randwrite workload before taking these
>> numbers :-
>
> Why would you add a module option, except to make everyones life hell?
Yeah that makes no sense. Either the driver is nowait compatible and
it should just set the flag, or it's not.
--
Jens Axboe
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/3] zram: allow user to set QUEUE_FLAG_NOWAIT
2023-05-12 14:31 ` Christoph Hellwig
2023-05-12 14:34 ` Jens Axboe
@ 2023-05-13 1:05 ` Chaitanya Kulkarni
1 sibling, 0 replies; 17+ messages in thread
From: Chaitanya Kulkarni @ 2023-05-13 1:05 UTC (permalink / raw)
To: Christoph Hellwig, Chaitanya Kulkarni
Cc: minchan@kernel.org, senozhatsky@chromium.org, axboe@kernel.dk,
linux-block@vger.kernel.org
On 5/12/23 07:31, Christoph Hellwig wrote:
> On Fri, May 12, 2023 at 01:29:56AM -0700, Chaitanya Kulkarni wrote:
>> Allow user to set the QUEUE_FLAG_NOWAIT optionally using module
>> parameter to retain the default behaviour. Also, update respective
>> allocation flags in the write path. Following are the performance
>> numbers with io_uring fio engine for random read, note that device has
>> been populated fully with randwrite workload before taking these
>> numbers :-
> Why would you add a module option, except to make everyones life hell?
send v2 without modparam.
-ck
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/3] zram: allow user to set QUEUE_FLAG_NOWAIT
2023-05-12 14:34 ` Jens Axboe
@ 2023-05-13 1:06 ` Chaitanya Kulkarni
2023-05-16 5:51 ` Chaitanya Kulkarni
0 siblings, 1 reply; 17+ messages in thread
From: Chaitanya Kulkarni @ 2023-05-13 1:06 UTC (permalink / raw)
To: Jens Axboe, Christoph Hellwig, Chaitanya Kulkarni
Cc: minchan@kernel.org, senozhatsky@chromium.org,
linux-block@vger.kernel.org
On 5/12/23 07:34, Jens Axboe wrote:
> On 5/12/23 8:31 AM, Christoph Hellwig wrote:
>> On Fri, May 12, 2023 at 01:29:56AM -0700, Chaitanya Kulkarni wrote:
>>> Allow user to set the QUEUE_FLAG_NOWAIT optionally using module
>>> parameter to retain the default behaviour. Also, update respective
>>> allocation flags in the write path. Following are the performance
>>> numbers with io_uring fio engine for random read, note that device has
>>> been populated fully with randwrite workload before taking these
>>> numbers :-
>> Why would you add a module option, except to make everyones life hell?
> Yeah that makes no sense. Either the driver is nowait compatible and
> it should just set the flag, or it's not.
>
send v2 without modparam.
-ck
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/3] zram: consolidate zram_bio_read()_zram_bio_write()
2023-05-12 14:32 ` Christoph Hellwig
@ 2023-05-13 1:06 ` Chaitanya Kulkarni
0 siblings, 0 replies; 17+ messages in thread
From: Chaitanya Kulkarni @ 2023-05-13 1:06 UTC (permalink / raw)
To: Christoph Hellwig, Chaitanya Kulkarni
Cc: minchan@kernel.org, senozhatsky@chromium.org, axboe@kernel.dk,
linux-block@vger.kernel.org
On 5/12/23 07:32, Christoph Hellwig wrote:
> On Fri, May 12, 2023 at 01:29:57AM -0700, Chaitanya Kulkarni wrote:
>> zram_bio_read() and zram_bio_write() are 26 lines rach and share most of
>> the code except call to zram_bdev_read() and zram_bvec_write() calls.
>> Consolidate code into single zram_bio_rw() to remove the duplicate code
>> and an extra function that is only needed for 2 lines of code :-
> And it adds an extra conditional to every page in the fast path..
will create a version without additional branch in v2 ..
-ck
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/3] zram: allow user to set QUEUE_FLAG_NOWAIT
2023-05-13 1:06 ` Chaitanya Kulkarni
@ 2023-05-16 5:51 ` Chaitanya Kulkarni
2023-05-16 13:08 ` Sergey Senozhatsky
0 siblings, 1 reply; 17+ messages in thread
From: Chaitanya Kulkarni @ 2023-05-16 5:51 UTC (permalink / raw)
To: Jens Axboe, Christoph Hellwig
Cc: minchan@kernel.org, senozhatsky@chromium.org,
linux-block@vger.kernel.org
Hi Jens/Christoph,
On 5/12/23 18:06, Chaitanya Kulkarni wrote:
> On 5/12/23 07:34, Jens Axboe wrote:
>> On 5/12/23 8:31 AM, Christoph Hellwig wrote:
>>> On Fri, May 12, 2023 at 01:29:56AM -0700, Chaitanya Kulkarni wrote:
>>>> Allow user to set the QUEUE_FLAG_NOWAIT optionally using module
>>>> parameter to retain the default behaviour. Also, update respective
>>>> allocation flags in the write path. Following are the performance
>>>> numbers with io_uring fio engine for random read, note that device has
>>>> been populated fully with randwrite workload before taking these
>>>> numbers :-
>>> Why would you add a module option, except to make everyones life hell?
>> Yeah that makes no sense. Either the driver is nowait compatible and
>> it should just set the flag, or it's not.
>>
> send v2 without modparam.
>
> -ck
Removed modparam v2 is ready to send, but I've few concerns enabling
nowait unconditionally for zram :-
From brd data [1] and zram data [2] from my setup :-
IOPs (old->new) | sys cpu% (old->new)
--------------------------------------------------
brd | 1.5x (3919 -> 5874) | 3x (29 -> 87)
zram | 1.09x ( 29 -> 87) | 9x (11 -> 97)
brd:-
IOPs increased by ~1.5 times (50% up)
sys CPU percentage increased by ~3.0 times (200% up)
zram:-
IOPs increased by ~1.09 times ( 9% up)
sys CPU percentage increased by ~8.81 times (781% up)
This comparison clearly demonstrates that zram experiences a much more
substantial CPU load relative to the increase in IOPs compared to brd.
Such a significant difference might suggest a potential CPU regression
in zram ?
Especially for zram, if applications are not expecting this high cpu
usage then they we'll get regression reports with default nowait
approach. How about we avoid something like this with one of the
following options ?
1. Provide a fix with module parameter. (Already NACKed).
2. Allow user to configure nowait from command line using zramctl.
Set QUEUE_FLAG_NOWAIT disabled by default.
3. Add a block layer generic sysfs attr nowait like nomerges, since
similar changes I've posted for pmem [3] and bcache [4] have same
issue. This generic way we will avoid duplicating code in
driver and allows user freedom to set or unset based on their
Set QUEUE_FLAG_NOWAIT disabled by default.
or please suggest any other way you guys can think is appropriate ...
-ck
[1] brd nowait off vs nowait on :-
linux-block (zram-nowait) # grep cpu brd-*fio | column -t
brd-default-nowait-off-1.fio: cpu : usr=6.34%, sys=29.84%,
ctx=216249754,
brd-default-nowait-off-2.fio: cpu : usr=6.41%, sys=29.83%,
ctx=217773657,
brd-default-nowait-off-3.fio: cpu : usr=6.37%, sys=30.05%,
ctx=222667703,
brd-nowait-on-1.fio: cpu : usr=10.18%, sys=88.35%, ctx=23221,
brd-nowait-on-2.fio: cpu : usr=10.02%, sys=86.82%, ctx=22396,
brd-nowait-on-3.fio: cpu : usr=10.17%, sys=86.29%, ctx=22207,
linux-block (zram-nowait) # grep IOPS brd-*fio | column -t
brd-default-nowait-off-1.fio: read: IOPS=3872k, BW=14.8GiB/s
brd-default-nowait-off-2.fio: read: IOPS=3933k, BW=15.0GiB/s
brd-default-nowait-off-3.fio: read: IOPS=3953k, BW=15.1GiB/s
brd-nowait-on-1.fio: read: IOPS=5884k, BW=22.4GiB/s
brd-nowait-on-2.fio: read: IOPS=5870k, BW=22.4GiB/s
brd-nowait-on-3.fio: read: IOPS=5870k, BW=22.4GiB/s
linux-block (zram-nowait) # grep slat brd-*fio | column -t
brd-default-nowait-off-1.fio: slat (nsec): min=440, max=56072k,
avg=9579.17
brd-default-nowait-off-2.fio: slat (nsec): min=440, max=42743k,
avg=9468.83
brd-default-nowait-off-3.fio: slat (nsec): min=431, max=32493k,
avg=9532.96
brd-nowait-on-1.fio: slat (nsec): min=1523, max=37786k,
avg=7596.58
brd-nowait-on-2.fio: slat (nsec): min=1503, max=40101k,
avg=7612.64
brd-nowait-on-3.fio: slat (nsec): min=1463, max=37298k,
avg=7610.89
[2] zram nowait off vs nowait on:-
linux-block (zram-nowait) # grep IOPS zram-*fio | column -t
zram-default-nowait-off-1.fio: read: IOPS=833k, BW=3254MiB/s
zram-default-nowait-off-2.fio: read: IOPS=845k, BW=3301MiB/s
zram-default-nowait-off-3.fio: read: IOPS=845k, BW=3301MiB/s
zram-nowait-on-1.fio: read: IOPS=917k, BW=3582MiB/s
zram-nowait-on-2.fio: read: IOPS=914k, BW=3569MiB/s
zram-nowait-on-3.fio: read: IOPS=917k, BW=3581MiB/s
linux-block (zram-nowait) # grep cpu zram-*fio | column -t
zram-default-nowait-off-1.fio: cpu : usr=5.18%, sys=11.31%, ctx=39945072
zram-default-nowait-off-2.fio: cpu : usr=5.20%, sys=11.49%, ctx=40591907
zram-default-nowait-off-3.fio: cpu : usr=5.31%, sys=11.86%, ctx=40252142
zram-nowait-on-1.fio: cpu : usr=1.87%, sys=97.05%, ctx=24337
zram-nowait-on-2.fio: cpu : usr=1.83%, sys=97.20%, ctx=21452
zram-nowait-on-3.fio: cpu : usr=1.84%, sys=97.20%, ctx=21051
linux-block (zram-nowait) # grep slat zram-*fio | column -t
zram-default-nowait-off-1.fio: slat (nsec): min=420, max=6960.6k,
avg=1859.09
zram-default-nowait-off-2.fio: slat (nsec): min=411, max=5387.7k,
avg=1848.79
zram-default-nowait-off-3.fio: slat (nsec): min=410, max=6914.2k,
avg=1902.51
zram-nowait-on-1.fio: slat (nsec): min=1092, max=32268k,
avg=51605.49
zram-nowait-on-2.fio: slat (nsec): min=1052, max=7676.3k,
avg=51806.82
zram-nowait-on-3.fio: slat (nsec): min=1062, max=10444k,
avg=51625.89
[3]
https://lore.kernel.org/nvdimm/b90ff1ba-b22c-bb37-db0a-4c46bb5f2a06@nvidia.com/T/#t
[4] https://marc.info/?l=linux-bcache&m=168388522305946&w=1
https://marc.info/?l=linux-bcache&m=168401821129652&w=2
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/3] zram: allow user to set QUEUE_FLAG_NOWAIT
2023-05-16 5:51 ` Chaitanya Kulkarni
@ 2023-05-16 13:08 ` Sergey Senozhatsky
2023-05-16 20:41 ` Chaitanya Kulkarni
0 siblings, 1 reply; 17+ messages in thread
From: Sergey Senozhatsky @ 2023-05-16 13:08 UTC (permalink / raw)
To: Chaitanya Kulkarni
Cc: Jens Axboe, Christoph Hellwig, minchan@kernel.org,
senozhatsky@chromium.org, linux-block@vger.kernel.org
On (23/05/16 05:51), Chaitanya Kulkarni wrote:
> Removed modparam v2 is ready to send, but I've few concerns enabling
> nowait unconditionally for zram :-
>
> From brd data [1] and zram data [2] from my setup :-
>
> IOPs (old->new) | sys cpu% (old->new)
> --------------------------------------------------
> brd | 1.5x (3919 -> 5874) | 3x (29 -> 87)
> zram | 1.09x ( 29 -> 87) | 9x (11 -> 97)
>
> brd:-
> IOPs increased by ~1.5 times (50% up)
> sys CPU percentage increased by ~3.0 times (200% up)
>
> zram:-
> IOPs increased by ~1.09 times ( 9% up)
> sys CPU percentage increased by ~8.81 times (781% up)
>
> This comparison clearly demonstrates that zram experiences a much more
> substantial CPU load relative to the increase in IOPs compared to brd.
> Such a significant difference might suggest a potential CPU regression
> in zram ?
>
> Especially for zram, if applications are not expecting this high cpu
> usage then they we'll get regression reports with default nowait
> approach. How about we avoid something like this with one of the
> following options ?
Well, zram performs decompression/compression on the CPU (per-CPU
crypto streams) for each IO operation, so zram IO is CPU intensive.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/3] zram: allow user to set QUEUE_FLAG_NOWAIT
2023-05-16 13:08 ` Sergey Senozhatsky
@ 2023-05-16 20:41 ` Chaitanya Kulkarni
2023-05-16 20:43 ` Jens Axboe
0 siblings, 1 reply; 17+ messages in thread
From: Chaitanya Kulkarni @ 2023-05-16 20:41 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Jens Axboe, Christoph Hellwig, minchan@kernel.org,
linux-block@vger.kernel.org
On 5/16/23 06:08, Sergey Senozhatsky wrote:
> On (23/05/16 05:51), Chaitanya Kulkarni wrote:
>> Removed modparam v2 is ready to send, but I've few concerns enabling
>> nowait unconditionally for zram :-
>>
>> From brd data [1] and zram data [2] from my setup :-
>>
>> IOPs (old->new) | sys cpu% (old->new)
>> --------------------------------------------------
>> brd | 1.5x (3919 -> 5874) | 3x (29 -> 87)
>> zram | 1.09x ( 29 -> 87) | 9x (11 -> 97)
>>
>> brd:-
>> IOPs increased by ~1.5 times (50% up)
>> sys CPU percentage increased by ~3.0 times (200% up)
>>
>> zram:-
>> IOPs increased by ~1.09 times ( 9% up)
>> sys CPU percentage increased by ~8.81 times (781% up)
>>
>> This comparison clearly demonstrates that zram experiences a much more
>> substantial CPU load relative to the increase in IOPs compared to brd.
>> Such a significant difference might suggest a potential CPU regression
>> in zram ?
>>
>> Especially for zram, if applications are not expecting this high cpu
>> usage then they we'll get regression reports with default nowait
>> approach. How about we avoid something like this with one of the
>> following options ?
> Well, zram performs decompression/compression on the CPU (per-CPU
> crypto streams) for each IO operation, so zram IO is CPU intensive.
and that is exactly I've raised this issue, are you okay with that ?
I'll send V2 with nowait enabled by default ..
-ck
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/3] zram: allow user to set QUEUE_FLAG_NOWAIT
2023-05-16 20:41 ` Chaitanya Kulkarni
@ 2023-05-16 20:43 ` Jens Axboe
2023-05-16 21:32 ` Chaitanya Kulkarni
0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2023-05-16 20:43 UTC (permalink / raw)
To: Chaitanya Kulkarni, Sergey Senozhatsky
Cc: Christoph Hellwig, minchan@kernel.org,
linux-block@vger.kernel.org
On 5/16/23 2:41 PM, Chaitanya Kulkarni wrote:
> On 5/16/23 06:08, Sergey Senozhatsky wrote:
>> On (23/05/16 05:51), Chaitanya Kulkarni wrote:
>>> Removed modparam v2 is ready to send, but I've few concerns enabling
>>> nowait unconditionally for zram :-
>>>
>>> From brd data [1] and zram data [2] from my setup :-
>>>
>>> IOPs (old->new) | sys cpu% (old->new)
>>> --------------------------------------------------
>>> brd | 1.5x (3919 -> 5874) | 3x (29 -> 87)
>>> zram | 1.09x ( 29 -> 87) | 9x (11 -> 97)
>>>
>>> brd:-
>>> IOPs increased by ~1.5 times (50% up)
>>> sys CPU percentage increased by ~3.0 times (200% up)
>>>
>>> zram:-
>>> IOPs increased by ~1.09 times ( 9% up)
>>> sys CPU percentage increased by ~8.81 times (781% up)
>>>
>>> This comparison clearly demonstrates that zram experiences a much more
>>> substantial CPU load relative to the increase in IOPs compared to brd.
>>> Such a significant difference might suggest a potential CPU regression
>>> in zram ?
>>>
>>> Especially for zram, if applications are not expecting this high cpu
>>> usage then they we'll get regression reports with default nowait
>>> approach. How about we avoid something like this with one of the
>>> following options ?
>> Well, zram performs decompression/compression on the CPU (per-CPU
>> crypto streams) for each IO operation, so zram IO is CPU intensive.
>
> and that is exactly I've raised this issue, are you okay with that ?
> I'll send V2 with nowait enabled by default ..
Did you check that it's actually nowait sane to begin with? I spent
literally 30 seconds on that when you sent the first patch, and the
partial/sync path does not look kosher.
--
Jens Axboe
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/3] zram: allow user to set QUEUE_FLAG_NOWAIT
2023-05-16 20:43 ` Jens Axboe
@ 2023-05-16 21:32 ` Chaitanya Kulkarni
2023-05-16 22:03 ` Jens Axboe
0 siblings, 1 reply; 17+ messages in thread
From: Chaitanya Kulkarni @ 2023-05-16 21:32 UTC (permalink / raw)
To: Jens Axboe, Sergey Senozhatsky
Cc: Christoph Hellwig, minchan@kernel.org,
linux-block@vger.kernel.org
On 5/16/23 13:43, Jens Axboe wrote:
> On 5/16/23 2:41 PM, Chaitanya Kulkarni wrote:
>> On 5/16/23 06:08, Sergey Senozhatsky wrote:
>>> On (23/05/16 05:51), Chaitanya Kulkarni wrote:
>>>> Removed modparam v2 is ready to send, but I've few concerns enabling
>>>> nowait unconditionally for zram :-
>>>>
>>>> From brd data [1] and zram data [2] from my setup :-
>>>>
>>>> IOPs (old->new) | sys cpu% (old->new)
>>>> --------------------------------------------------
>>>> brd | 1.5x (3919 -> 5874) | 3x (29 -> 87)
>>>> zram | 1.09x ( 29 -> 87) | 9x (11 -> 97)
>>>>
>>>> brd:-
>>>> IOPs increased by ~1.5 times (50% up)
>>>> sys CPU percentage increased by ~3.0 times (200% up)
>>>>
>>>> zram:-
>>>> IOPs increased by ~1.09 times ( 9% up)
>>>> sys CPU percentage increased by ~8.81 times (781% up)
>>>>
>>>> This comparison clearly demonstrates that zram experiences a much more
>>>> substantial CPU load relative to the increase in IOPs compared to brd.
>>>> Such a significant difference might suggest a potential CPU regression
>>>> in zram ?
>>>>
>>>> Especially for zram, if applications are not expecting this high cpu
>>>> usage then they we'll get regression reports with default nowait
>>>> approach. How about we avoid something like this with one of the
>>>> following options ?
>>> Well, zram performs decompression/compression on the CPU (per-CPU
>>> crypto streams) for each IO operation, so zram IO is CPU intensive.
>> and that is exactly I've raised this issue, are you okay with that ?
>> I'll send V2 with nowait enabled by default ..
> Did you check that it's actually nowait sane to begin with? I spent
> literally 30 seconds on that when you sent the first patch, and the
> partial/sync path does not look kosher.
>
I did check for it and it needs a nowait sane preparation in V2 with
something like this [1] plus returning error with bio_wouldblock_error()
when we end up in read_from_bdev_sync() when it is not called from
writeback_store(). (rough changes [1])
But after taking performance numbers repeatedly, the main concern I
failed to resolve is above numbers & default enabling nowait for
zram ...
-ck
[1]
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 08d198431a88..c2f154911cc0 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -55,7 +55,7 @@ static const struct block_device_operations zram_devops;
static void zram_free_page(struct zram *zram, size_t index);
static int zram_read_page(struct zram *zram, struct page *page, u32 index,
- struct bio *parent, blk_opf_t opf);
+ struct bio *parent, bool nowait);
static int zram_slot_trylock(struct zram *zram, u32 index)
{
@@ -690,7 +690,7 @@ static ssize_t writeback_store(struct device *dev,
/* Need for hugepage writeback racing */
zram_set_flag(zram, index, ZRAM_IDLE);
zram_slot_unlock(zram, index);
- if (zram_read_page(zram, page, index, NULL, 0)) {
+ if (zram_read_page(zram, page, index, NULL, false)) {
zram_slot_lock(zram, index);
zram_clear_flag(zram, index, ZRAM_UNDER_WB);
zram_clear_flag(zram, index, ZRAM_IDLE);
@@ -772,6 +772,7 @@ struct zram_work {
unsigned long entry;
struct page *page;
int error;
+ bool nowait;
};
static void zram_sync_read(struct work_struct *work)
@@ -779,8 +780,9 @@ static void zram_sync_read(struct work_struct *work)
struct zram_work *zw = container_of(work, struct zram_work, work);
struct bio_vec bv;
struct bio bio;
+ blk_opf_t_nowait = zw->nowait ? REQ_NOWAIT : 0;
- bio_init(&bio, zw->zram->bdev, &bv, 1, REQ_OP_READ);
+ bio_init(&bio, zw->zram->bdev, &bv, 1, REQ_OP_READ | nowait);
bio.bi_iter.bi_sector = zw->entry * (PAGE_SIZE >> 9);
__bio_add_page(&bio, zw->page, PAGE_SIZE, 0);
zw->error = submit_bio_wait(&bio);
@@ -792,18 +794,14 @@ static void zram_sync_read(struct work_struct *work)
* use a worker thread context.
*/
static int read_from_bdev_sync(struct zram *zram, struct page *page,
- unsigned long entry, blk_opf_t opf)
+ unsigned long entry, bool nowait)
{
struct zram_work work;
work.page = page;
work.zram = zram;
work.entry = entry;
-
- if (opf & REQ_NOWAIT) {
- zram_sync_read(&work);
- return work.error;
- }
+ work.nowait = nowait;
INIT_WORK_ONSTACK(&work.work, zram_sync_read);
queue_work(system_unbound_wq, &work.work);
@@ -814,21 +812,21 @@ static int read_from_bdev_sync(struct zram *zram,
struct page *page,
}
static int read_from_bdev(struct zram *zram, struct page *page,
- unsigned long entry, struct bio *parent, blk_opf_t opf)
+ unsigned long entry, struct bio *parent, bool nowait)
{
atomic64_inc(&zram->stats.bd_reads);
if (!parent) {
if (WARN_ON_ONCE(!IS_ENABLED(ZRAM_PARTIAL_IO)))
return -EIO;
- return read_from_bdev_sync(zram, page, entry, opf);
+ return read_from_bdev_sync(zram, page, entry, nowait);
}
- read_from_bdev_async(zram, page, entry, parent, opf);
+ read_from_bdev_async(zram, page, entry, parent, nowait);
return 0;
}
#else
static inline void reset_bdev(struct zram *zram) {};
static int read_from_bdev(struct zram *zram, struct page *page,
- unsigned long entry, struct bio *parent, blk_opf_t opf)
+ unsigned long entry, struct bio *parent, bool nowait)
{
return -EIO;
}
@@ -1361,7 +1359,7 @@ static int zram_read_from_zspool(struct zram
*zram, struct page *page,
}
static int zram_read_page(struct zram *zram, struct page *page, u32 index,
- struct bio *parent, blk_opf_t bi_opf)
+ struct bio *parent, bool nowait)
{
int ret;
@@ -1378,7 +1376,7 @@ static int zram_read_page(struct zram *zram,
struct page *page, u32 index,
zram_slot_unlock(zram, index);
ret = read_from_bdev(zram, page, zram_get_element(zram, index),
- parent, bi_opf);
+ parent, nowait);
}
/* Should NEVER happen. Return bio error if it does. */
@@ -1395,13 +1393,14 @@ static int zram_read_page(struct zram *zram,
struct page *page, u32 index,
static int zram_bvec_read_partial(struct zram *zram, struct bio_vec *bvec,
u32 index, int offset, struct bio *bio)
{
+ bool nowait = bio->bi_opf & REQ_NOWAIT;
struct page *page;
int ret;
- page = alloc_page(bio->bi_opf & REQ_NOWAIT ? GFP_NOWAIT : GFP_NOIO);
+ page = alloc_page(nowait ? GFP_NOWAIT : GFP_NOIO);
if (!page)
return -ENOMEM;
- ret = zram_read_page(zram, page, index, NULL, bio->bi_opf);
+ ret = zram_read_page(zram, page, index, NULL, nowait);
if (likely(!ret))
memcpy_to_bvec(bvec, page_address(page) + offset);
__free_page(page);
@@ -1413,7 +1412,8 @@ static int zram_bvec_read(struct zram *zram,
struct bio_vec *bvec,
{
if (is_partial_io(bvec))
return zram_bvec_read_partial(zram, bvec, index, offset, bio);
- return zram_read_page(zram, bvec->bv_page, index, bio, bio->bi_opf);
+ return zram_read_page(zram, bvec->bv_page, index, bio,
+ bio->bi_opf & REQ_NOWAIT);
}
static int zram_write_page(struct zram *zram, struct page *page, u32
index)
@@ -1547,14 +1547,15 @@ static int zram_write_page(struct zram *zram,
struct page *page, u32 index)
static int zram_bvec_write_partial(struct zram *zram, struct bio_vec
*bvec,
u32 index, int offset, struct bio *bio)
{
+ bool nowait = bio->bi_opf & REQ_NOWAIT;
struct page *page;
int ret;
- page = alloc_page(bio->bi_opf & REQ_NOWAIT ? GFP_NOWAIT : GFP_NOIO);
+ page = alloc_page(nowait ? GFP_NOWAIT : GFP_NOIO);
if (!page)
return -ENOMEM;
- ret = zram_read_page(zram, page, index, bio, bio->bi_opf);
+ ret = zram_read_page(zram, page, index, bio, nowait);
if (!ret) {
memcpy_from_bvec(page_address(page) + offset, bvec);
ret = zram_write_page(zram, page, index);
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 1/3] zram: allow user to set QUEUE_FLAG_NOWAIT
2023-05-16 21:32 ` Chaitanya Kulkarni
@ 2023-05-16 22:03 ` Jens Axboe
0 siblings, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2023-05-16 22:03 UTC (permalink / raw)
To: Chaitanya Kulkarni, Sergey Senozhatsky
Cc: Christoph Hellwig, minchan@kernel.org,
linux-block@vger.kernel.org
On 5/16/23 3:32?PM, Chaitanya Kulkarni wrote:
> On 5/16/23 13:43, Jens Axboe wrote:
>> On 5/16/23 2:41?PM, Chaitanya Kulkarni wrote:
>>> On 5/16/23 06:08, Sergey Senozhatsky wrote:
>>>> On (23/05/16 05:51), Chaitanya Kulkarni wrote:
>>>>> Removed modparam v2 is ready to send, but I've few concerns enabling
>>>>> nowait unconditionally for zram :-
>>>>>
>>>>> From brd data [1] and zram data [2] from my setup :-
>>>>>
>>>>> IOPs (old->new) | sys cpu% (old->new)
>>>>> --------------------------------------------------
>>>>> brd | 1.5x (3919 -> 5874) | 3x (29 -> 87)
>>>>> zram | 1.09x ( 29 -> 87) | 9x (11 -> 97)
>>>>>
>>>>> brd:-
>>>>> IOPs increased by ~1.5 times (50% up)
>>>>> sys CPU percentage increased by ~3.0 times (200% up)
>>>>>
>>>>> zram:-
>>>>> IOPs increased by ~1.09 times ( 9% up)
>>>>> sys CPU percentage increased by ~8.81 times (781% up)
>>>>>
>>>>> This comparison clearly demonstrates that zram experiences a much more
>>>>> substantial CPU load relative to the increase in IOPs compared to brd.
>>>>> Such a significant difference might suggest a potential CPU regression
>>>>> in zram ?
>>>>>
>>>>> Especially for zram, if applications are not expecting this high cpu
>>>>> usage then they we'll get regression reports with default nowait
>>>>> approach. How about we avoid something like this with one of the
>>>>> following options ?
>>>> Well, zram performs decompression/compression on the CPU (per-CPU
>>>> crypto streams) for each IO operation, so zram IO is CPU intensive.
>>> and that is exactly I've raised this issue, are you okay with that ?
>>> I'll send V2 with nowait enabled by default ..
>> Did you check that it's actually nowait sane to begin with? I spent
>> literally 30 seconds on that when you sent the first patch, and the
>> partial/sync path does not look kosher.
>>
>
> I did check for it and it needs a nowait sane preparation in V2 with
> something like this [1] plus returning error with bio_wouldblock_error()
That looks pretty awful... Things like:
> @@ -779,8 +780,9 @@ static void zram_sync_read(struct work_struct *work)
> struct zram_work *zw = container_of(work, struct zram_work, work);
> struct bio_vec bv;
> struct bio bio;
> + blk_opf_t_nowait = zw->nowait ? REQ_NOWAIT : 0;
>
> - bio_init(&bio, zw->zram->bdev, &bv, 1, REQ_OP_READ);
> + bio_init(&bio, zw->zram->bdev, &bv, 1, REQ_OP_READ | nowait);
> bio.bi_iter.bi_sector = zw->entry * (PAGE_SIZE >> 9);
> __bio_add_page(&bio, zw->page, PAGE_SIZE, 0);
> zw->error = submit_bio_wait(&bio);
>
make absolutely no sense, setting REQ_NOWAIT and then waiting on IO
completion.
> when we end up in read_from_bdev_sync() when it is not called from
> writeback_store(). (rough changes [1])
writeback_store() will never have nowait set. Outside of that, pushing
the nowait handling to the workqueue makes no sense, that flag only
applies for inline issue.
> But after taking performance numbers repeatedly, the main concern I
> failed to resolve is above numbers & default enabling nowait for
> zram ...
It's a choice you make if you do nowait IO, normal block IO would not
change at all. So I think it's fine, but the driver needs to be in a
sane state to support nowait to begin with.
--
Jens Axboe
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2023-05-16 22:03 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-12 8:29 [PATCH 0/3] zram: queue flag nowait and mior cleanup Chaitanya Kulkarni
2023-05-12 8:29 ` [PATCH 1/3] zram: allow user to set QUEUE_FLAG_NOWAIT Chaitanya Kulkarni
2023-05-12 14:31 ` Christoph Hellwig
2023-05-12 14:34 ` Jens Axboe
2023-05-13 1:06 ` Chaitanya Kulkarni
2023-05-16 5:51 ` Chaitanya Kulkarni
2023-05-16 13:08 ` Sergey Senozhatsky
2023-05-16 20:41 ` Chaitanya Kulkarni
2023-05-16 20:43 ` Jens Axboe
2023-05-16 21:32 ` Chaitanya Kulkarni
2023-05-16 22:03 ` Jens Axboe
2023-05-13 1:05 ` Chaitanya Kulkarni
2023-05-12 8:29 ` [PATCH 2/3] zram: consolidate zram_bio_read()_zram_bio_write() Chaitanya Kulkarni
2023-05-12 14:32 ` Christoph Hellwig
2023-05-13 1:06 ` Chaitanya Kulkarni
2023-05-12 8:29 ` [PATCH 3/3] zram: add flush_dcache_page() call for write Chaitanya Kulkarni
2023-05-12 14:34 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).