linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Suparna Bhattacharya <suparna@in.ibm.com>
To: linux-aio@kvack.org, akpm@osdl.org, drepper@redhat.com
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	jakub@redhat.com, mingo@elte.hu
Subject: [PATCHSET 1][PATCH 0/6] Filesystem AIO read/write
Date: Thu, 28 Dec 2006 13:53:08 +0530	[thread overview]
Message-ID: <20061228082308.GA4476@in.ibm.com> (raw)
In-Reply-To: <20061227153855.GA25898@in.ibm.com>


Currently native linux AIO is properly supported (in the sense of
actually being asynchronous) only for files opened with O_DIRECT.
While this suffices for a major (and most visible) user of AIO, i.e. databases,
other types of users like Samba require AIO support for regular file IO.
Also, for glibc POSIX AIO to be able to switch to using native AIO instead
of the current simulation using threads, it needs/expects asynchronous
behaviour for both O_DIRECT and buffered file AIO.

This patchset implements changes to make filesystem AIO read
and write asynchronous for the non O_DIRECT case. This is mainly
relevant in the case of reads of uncached or partially cached files, and
O_SYNC writes. 

Instead of translating regular IO to [AIO + wait], it translates AIO
to [regular IO - blocking + retries]. The intent of implementing it
this way is to avoid modifying or slowing down normal usage, by keeping
it pretty much the way it is without AIO, while avoiding code duplication.
Instead we make AIO vs regular IO checks inside io_schedule(), i.e. at
the blocking points. The low-level unit of distinction is a wait queue
entry, which in the AIO case is contained in an iocb and in the
synchronous IO case is associated with the calling task.

The core idea is that is we complete as much IO as we can in a non-blocking
fashion, and then continue the remaining part of the transfer again when
woken up asynchronously via a wait queue callback when pages are ready ... 
thus each iteration progresses through more of the request until it is
completed. The interesting part here is that owing largely to the idempotence
in the way radix-tree page cache traveral happens, every iteration is simply
a smaller read/write. Almost all of the iocb manipulation and advancement
in the AIO case happens in the high level AIO code, and rather than in
regular VFS/filesystem paths.

The following is a sampling of comparative aio-stress results with the
patches (each run starts with uncached files):

---------------------------------------------
				
aio-stress throughput comparisons (in MB/s):

file size 1GB, record size 64KB, depth 64, ios per iteration 8
max io_submit 8, buffer alignment set to 4KB
4 way Pentium III SMP box, Adaptec AIC-7896/7 Ultra2 SCSI, 40 MB/s
Filesystem: ext2

----------------------------------------------------------------------------
			Buffered (non O_DIRECT)
			Vanilla		Patched		O_DIRECT
----------------------------------------------------------------------------
						       Vanilla Patched
Random-Read		10.08		23.91		18.91,   18.98
Random-O_SYNC-Write	 8.86		15.84		16.51,   16.53
Sequential-Read		31.49		33.00		31.86,   31.79
Sequential-O_SYNC-Write  8.68		32.60		31.45,   32.44
Random-Write		31.09 (19.65)	30.90 (19.65)	
Sequential-Write	30.84 (28.94)	30.09 (28.39)

----------------------------------------------------------------------------

Regards
Suparna

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India


  parent reply	other threads:[~2006-12-28  8:18 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-12-27 15:38 [RFC] Heads up on a series of AIO patchsets Suparna Bhattacharya
2006-12-27 16:25 ` Christoph Hellwig
2006-12-27 16:55   ` Ingo Molnar
2006-12-27 17:18     ` Ingo Molnar
2006-12-28 11:41   ` Evgeniy Polyakov
2007-01-02 21:38     ` Dan Williams
2007-01-03 13:35       ` Evgeniy Polyakov
2006-12-28  8:23 ` Suparna Bhattacharya [this message]
2006-12-28  8:34   ` [FSAIO][PATCH 1/6] Add a wait queue parameter to the wait_bit action routine Suparna Bhattacharya
2006-12-28  8:46     ` Suparna Bhattacharya
2006-12-28  8:36   ` [FSAIO][PATCH 2/8] Rename __lock_page to lock_page_slow Suparna Bhattacharya
2006-12-28  8:39   ` [FSAIO][PATCH 3/8] Routines to initialize and test a wait bit key Suparna Bhattacharya
2006-12-28 22:42     ` Andrew Morton
2006-12-28  8:39   ` [FSAIO][PATCH 4/8] Add a default io wait bit field in task struct Suparna Bhattacharya
2006-12-28  8:40   ` [FSAIO][PATCH 5/8] Enable wait bit based filtered wakeups to work for AIO Suparna Bhattacharya
2006-12-28  8:41   ` [FSAIO][PATCH 6/8] Enable asynchronous wait page and lock page Suparna Bhattacharya
2006-12-28 11:55     ` Christoph Hellwig
2006-12-28 14:47       ` Suparna Bhattacharya
2007-01-02 14:26         ` Christoph Hellwig
2007-01-04  6:50           ` Nick Piggin
2006-12-28  8:42   ` [FSAIO][PATCH 7/8] Filesystem AIO read Suparna Bhattacharya
2006-12-28 11:57     ` Christoph Hellwig
2006-12-28 14:15       ` Christoph Hellwig
2006-12-28 15:18       ` Suparna Bhattacharya
2007-01-02 14:29         ` Christoph Hellwig
2006-12-28 16:22       ` Jan Engelhardt
2006-12-28 16:56         ` Randy Dunlap
2006-12-28  8:44   ` [FSAIO][PATCH 8/8] AIO O_SYNC filesystem write Suparna Bhattacharya
2006-12-28  9:52   ` [PATCHSET 1][PATCH 0/6] Filesystem AIO read/write Ingo Molnar
2006-12-28 22:53   ` Andrew Morton
2007-01-03 22:15   ` Andrew Morton
2007-01-04  4:56     ` Suparna Bhattacharya
2007-01-04  5:51       ` Nick Piggin
2007-01-04  6:26         ` Suparna Bhattacharya
2007-01-04  6:50           ` Nick Piggin
2007-01-04 11:24             ` Suparna Bhattacharya
2007-01-05  4:56               ` Nick Piggin
2007-01-04 17:02       ` Andrew Morton
2007-01-04 17:49         ` Jens Axboe
2007-01-05  6:28         ` Suparna Bhattacharya
2007-01-05  7:02           ` Jens Axboe
2007-01-05  8:08             ` Suparna Bhattacharya
2007-01-05  8:32               ` Jens Axboe
2007-01-10  5:44         ` Suparna Bhattacharya
2007-01-11  1:08           ` Andrew Morton
2007-01-11  3:13             ` Suparna Bhattacharya
2007-01-11  4:52               ` Andrew Morton
2007-01-02 23:56 ` [RFC] Heads up on a series of AIO patchsets Zach Brown
     [not found]   ` <6f703f960701021640y444bc537w549fd6d74f3e9529@mail.gmail.com>
     [not found]     ` <A85B8249-FC4E-4612-8B28-02BC680DC812@oracle.com>
2007-01-03  1:18       ` Kent Overstreet
2007-01-04 20:33         ` Pavel Machek
2007-01-03  5:03   ` Suparna Bhattacharya
2007-01-05  0:36     ` Zach Brown
2007-01-03  7:23 ` [PATCHSET 2][PATCH 1/1] Combining epoll and disk file AIO Suparna Bhattacharya
2007-01-04  9:27 ` [PATCHSET 3][PATCH 0/5][AIO] - AIO completion signal notification v4 Bharata B Rao
2007-01-04  9:30   ` [PATCHSET 3][PATCH 1/5][AIO] - Rework compat_sys_io_submit Bharata B Rao
2007-01-04  9:32   ` [PATCHSET 3][PATCH 2/5][AIO] - fix aio.h includes Bharata B Rao
2007-01-04  9:34   ` [PATCHSET 3][PATCH 3/5][AIO] - Make good_sigevent non-static Bharata B Rao
2007-01-04  9:38   ` [PATCHSET 3][PATCH 4/5][AIO] - AIO completion signal notification Bharata B Rao
2007-01-04  9:40   ` [PATCHSET 3][PATCH 5/5][AIO] - Add listio support Bharata B Rao
2007-01-05  5:32 ` [PATCHSET 4][PATCH 1/1] AIO fallback for pipes, sockets and pollable fds Suparna Bhattacharya

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061228082308.GA4476@in.ibm.com \
    --to=suparna@in.ibm.com \
    --cc=akpm@osdl.org \
    --cc=drepper@redhat.com \
    --cc=jakub@redhat.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).