From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from bombadil.infradead.org ([198.137.202.9]:60969 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752391AbbLXOT1 (ORCPT ); Thu, 24 Dec 2015 09:19:27 -0500 From: Christoph Hellwig To: viro@zeniv.linux.org.uk, axboe@fb.com Cc: milosz@adfin.com, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-api@vger.kernel.org Subject: selective block polling and preadv2/pwritev2 revisited Date: Thu, 24 Dec 2015 15:14:18 +0100 Message-Id: <1450966464-6847-1-git-send-email-hch@lst.de> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: This series allows to selectively enable/disable polling for completions in the block layer [1] on a per-I/O basis. For this it resurrects the preadv2/pwritev2 syscalls that Milosz prepared a while ago (and which are much simpler now due to VFS changes that happened in the meantime). That approach also had a man page update prepared, which I will resubmit with the current flags once this series makes it in. Polling for block I/O is important to reduce the latency on flash and post-flash storage technologies. On the fastest NVMe controller I have access to it almost halves latencies from over 7 microseconds to about 4 microseonds. But it only is usesful if we actually care for the latency of this particular I/O, and generally is a waste if enabled for all I/O to a given device. This series uses the per-I/O flags in preadv2/pwritev2 to control this behavior. The alternative would be a new O_* flag set at open time or using fcntl, but this is still to corse-grained for some applications and we're starting to run out out of open flags. Note that there are plenty of other use cases for preadv2/pwritev2 as well, but I'd like to concentrate on this one for now. Example are: non-blocking reads (the original purpose), per-I/O O_SYNC, user space support for T10 DIF/DIX applications tags and probably some more. [1] only supported for NVMe at the moment.