From: Christoph Hellwig <hch@lst.de>
To: viro@zeniv.linux.org.uk, axboe@kernel.dk
Cc: Milosz Tanski <milosz@adfin.com>,
Goldwyn Rodrigues <rgoldwyn@suse.com>,
mgorman@suse.de, Volker.Lendecke@sernet.de,
linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org
Subject: non-blocking buffered reads V5
Date: Tue, 29 Aug 2017 16:13:17 +0200 [thread overview]
Message-ID: <20170829141321.4482-1-hch@lst.de> (raw)
This series resurrects the old patches from Milosz to implement
non-blocking buffered reads. Thanks to the non-blocking AIO code from
Goldwyn the implementation becomes pretty much trivial.
I've also forward ported the test Milosz sent for recent xfsprogs to
verify that this series works properly, but I'll still have to address
the review comments for it. I'll also volunteer to work with Goldwyn to
properly document the RWF_NOWAIT flag in the man page including this
change.
Changes from V4:
- improve conditionals in generic_file_buffered_read
Changes from V3:
- forward ported to the latest kernel
- fixed a compiler warning
Changes from V2:
- keep returning -EOPNOTSUPP for the not supported buffered write case
- add block device node support
- rebase against current Linus' tree, which has all the requirements
Changes from V1:
- fix btrfs to reject nowait buffered writes
- tested btrfs and ext4 in addition to xfs this time
Here are additional details from the original cover letter from Milosz,
where the flag was still called RWF_NONBLOCK:
Background:
Using a threadpool to emulate non-blocking operations on regular buffered
files is a common pattern today (samba, libuv, etc...) Applications split the
work between network bound threads (epoll) and IO threadpool. Not every
application can use sendfile syscall (TLS / post-processing).
This common pattern leads to increased request latency. Latency can be due to
additional synchronization between the threads or fast (cached data) request
stuck behind slow request (large / uncached data).
The preadv2 syscall with RWF_NONBLOCK lets userspace applications bypass
enqueuing operation in the threadpool if it's already available in the
pagecache.
Performance numbers (newer Samba):
https://drive.google.com/file/d/0B3maCn0jCvYncndGbXJKbGlhejQ/view?usp=sharing
https://docs.google.com/spreadsheets/d/1GGTivi-MfZU0doMzomG4XUo9ioWtRvOGQ5FId042L6s/edit?usp=sharing
Performance number (older):
Some perf data generated using fio comparing the posix aio engine to a version
of the posix AIO engine that attempts to performs "fast" reads before
submitting the operations to the queue. This workflow is on ext4 partition on
raid0 (test / build-rig.) Simulating our database access patern workload using
16kb read accesses. Our database uses a home-spun posix aio like queue (samba
does the same thing.)
f1: ~73% rand read over mostly cached data (zipf med-size dataset)
f2: ~18% rand read over mostly un-cached data (uniform large-dataset)
f3: ~9% seq-read over large dataset
before:
f1:
bw (KB /s): min= 11, max= 9088, per=0.56%, avg=969.54, stdev=827.99
lat (msec) : 50=0.01%, 100=1.06%, 250=5.88%, 500=4.08%, 750=12.48%
lat (msec) : 1000=17.27%, 2000=49.86%, >=2000=9.42%
f2:
bw (KB /s): min= 2, max= 1882, per=0.16%, avg=273.28, stdev=220.26
lat (msec) : 250=5.65%, 500=3.31%, 750=15.64%, 1000=24.59%, 2000=46.56%
lat (msec) : >=2000=4.33%
f3:
bw (KB /s): min= 0, max=265568, per=99.95%, avg=174575.10,
stdev=34526.89
lat (usec) : 2=0.01%, 4=0.01%, 10=0.02%, 20=0.27%, 50=10.82%
lat (usec) : 100=50.34%, 250=5.05%, 500=7.12%, 750=6.60%, 1000=4.55%
lat (msec) : 2=8.73%, 4=3.49%, 10=1.83%, 20=0.89%, 50=0.22%
lat (msec) : 100=0.05%, 250=0.02%, 500=0.01%
total:
READ: io=102365MB, aggrb=174669KB/s, minb=240KB/s, maxb=173599KB/s,
mint=600001msec, maxt=600113msec
after (with fast read using preadv2 before submit):
f1:
bw (KB /s): min= 3, max=14897, per=1.28%, avg=2276.69, stdev=2930.39
lat (usec) : 2=70.63%, 4=0.01%
lat (msec) : 250=0.20%, 500=2.26%, 750=1.18%, 2000=0.22%, >=2000=25.53%
f2:
bw (KB /s): min= 2, max= 2362, per=0.14%, avg=249.83, stdev=222.00
lat (msec) : 250=6.35%, 500=1.78%, 750=9.29%, 1000=20.49%, 2000=52.18%
lat (msec) : >=2000=9.99%
f3:
bw (KB /s): min= 1, max=245448, per=100.00%, avg=177366.50,
stdev=35995.60
lat (usec) : 2=64.04%, 4=0.01%, 10=0.01%, 20=0.06%, 50=0.43%
lat (usec) : 100=0.20%, 250=1.27%, 500=2.93%, 750=3.93%, 1000=7.35%
lat (msec) : 2=14.27%, 4=2.88%, 10=1.54%, 20=0.81%, 50=0.22%
lat (msec) : 100=0.05%, 250=0.02%
total:
READ: io=103941MB, aggrb=177339KB/s, minb=213KB/s, maxb=176375KB/s,
mint=600020msec, maxt=600178msec
Interpreting the results you can see total bandwidth stays the same but overall
request latency is decreased in f1 (random, mostly cached) and f3 (sequential)
workloads. There is a slight bump in latency for since it's random data that's
unlikely to be cached but we're always trying "fast read".
In our application we have starting keeping track of "fast read" hits/misses
and for files / requests that have a lot hit ratio we don't do "fast reads"
mostly getting rid of extra latency in the uncached cases. In our real world
work load we were able to reduce average response time by 20 to 30% (depends
on amount of IO done by request).
I've performed other benchmarks and I have no observed any perf regressions in
any of the normal (old) code paths.
next reply other threads:[~2017-08-29 14:14 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-29 14:13 Christoph Hellwig [this message]
2017-08-29 14:13 ` [PATCH 1/4] fs: pass iocb to do_generic_file_read Christoph Hellwig
2017-08-29 14:13 ` [PATCH 2/4] fs: support IOCB_NOWAIT in generic_file_buffered_read Christoph Hellwig
2017-09-04 12:47 ` Jan Kara
2017-08-29 14:13 ` [PATCH 3/4] fs: support RWF_NOWAIT for buffered reads Christoph Hellwig
2017-08-29 14:13 ` [PATCH 4/4] block_dev: support RFW_NOWAIT on block device nodes Christoph Hellwig
2017-09-01 8:13 ` non-blocking buffered reads V5 Christoph Hellwig
2017-09-01 14:58 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170829141321.4482-1-hch@lst.de \
--to=hch@lst.de \
--cc=Volker.Lendecke@sernet.de \
--cc=axboe@kernel.dk \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=milosz@adfin.com \
--cc=rgoldwyn@suse.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).