From: Abhijith Das <adas@redhat.com>
To: Boaz Harrosh <bharrosh@panasas.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>,
Steve Dickson <steved@redhat.com>,
Jeff Layton <jlayton@redhat.com>,
lsf-pc@lists.linux-foundation.org,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Ganesha NFS List <nfs-ganesha-devel@lists.sourceforge.net>,
Frank S Filz <ffilz@us.ibm.com>,
"J. Bruce Fields" <bfields@redhat.com>,
Jim Lieb <jlieb@panasas.com>,
Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>,
DENIEL Philippe <philippe.deniel@cea.fr>
Subject: Re: [1/8] readdir-plus system call
Date: Mon, 8 Apr 2013 15:02:40 -0400 (EDT) [thread overview]
Message-ID: <1784100361.2097625.1365447760580.JavaMail.root@redhat.com> (raw)
In-Reply-To: <51629A76.1020609@panasas.com>
Hi Boaz/All,
----- Original Message -----
> From: "Boaz Harrosh" <bharrosh@panasas.com>
> To: "Steven Whitehouse" <swhiteho@redhat.com>, "Steve Dickson" <steved@redhat.com>, "Jeff Layton"
> <jlayton@redhat.com>, lsf-pc@lists.linux-foundation.org, "linux-fsdevel" <linux-fsdevel@vger.kernel.org>, "Ganesha
> NFS List" <nfs-ganesha-devel@lists.sourceforge.net>, "Frank S Filz" <ffilz@us.ibm.com>, "J. Bruce Fields"
> <bfields@redhat.com>, "Jim Lieb" <jlieb@panasas.com>, "Venkateswararao Jujjuri" <jvrao@linux.vnet.ibm.com>, "DENIEL
> Philippe" <philippe.deniel@cea.fr>
> Sent: Monday, April 8, 2013 5:22:46 AM
> Subject: [1/8] readdir-plus system call
>
> By: Steven Whitehouse <swhiteho@redhat.com>)
>
> I repeat below Steve's original mail. Steve you said you have
> some experimental code, could you post an header and a git URL
> so we can have a look?
The patchset I'm working on is in a local tree, but the latest bits are available in this Red Hat Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=850426#c14
>From a GFS2 perspective, the need for such a system call arose from our talks with Samba folks to better support clustered samba over GFS2. The system call simply collects dirents along with stat and extended attributes and copies the info out to the user buffer. This patchset is a first-attempt at tackling this problem from a GFS2 perspective and is mainly a way to get us talking about possible implementations.
As the patches stand right now, the VFS bits are just hooks and all the real work is done in the GFS2 filesystem. However, there are some bits that could be moved into the VFS so other filesystems can utilize them.
For obtaining stat info, I'm making use of VFS bits of the xstat and fxstat system calls that David Howells proposed here : https://lists.samba.org/archive/samba-technical/2012-April/082906.html
There are 4 parts to my readdirplus (xgetdents()) patches:
Patch 1of4 adds the xgetdents() syscall interface, xreaddir() f_op and the linux_xdirent structure that specifies how the collected data is packaged to the user. From the caller's perspective, it behaves very much like the getdents() syscall except for the -EAGAIN return code. This would require the caller to re-issue the syscall with the same parameters.
Patch 2of4 is a gfs2 patch that adds a data structure that is a resizeable buffer backed by a vector of pages. This is used to collect all the intermediate data before writing it out to the user buffer.
Patch 3of4 is a simple port of the sort() function from lib/sort.c called ctx_sort(). Only difference is that it takes an additional (void *) opaque context pointer and passes it to the compare() and swap() functions. I needed this to be able to sort pointers stored in the vector of pages buffer.
Patch 4of4 has GFS2's implementation of the xreaddir() f_op and all its supporting functions. gfs2_xreaddir() tries to collect the requested data efficiently by ordering disk block accesses based on the filesystem's on-disk layout and also by adjusting the resizeable buffer as needed.
In my quick testing with a 50,000 file directory, xgetdents() is at least twice as fast as getdents()+stat()+getxattr() with a cold cache and nearly thrice as fast when the disk blocks have been cached.
Cheers!
--Abhi
next prev parent reply other threads:[~2013-04-08 19:08 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-08 10:19 [LSF/MM TOPIC (expanded) 0/8] New API's for better exporting of VFS from user-mode daemons Boaz Harrosh
2013-04-08 10:22 ` [1/8] readdir-plus system call Boaz Harrosh
2013-04-08 10:26 ` Steven Whitehouse
2013-04-08 15:18 ` [Nfs-ganesha-devel] " Matt W. Benjamin
2013-04-08 13:51 ` DENIEL Philippe
2013-04-08 19:02 ` Abhijith Das [this message]
2013-04-10 20:31 ` Andreas Dilger
2013-05-24 16:14 ` [1/8] readdir-plus system call - LSF/MM follow up Abhijith Das
2013-05-24 19:41 ` Zach Brown
2013-05-28 14:49 ` Abhijith Das
2013-05-28 15:13 ` Jim Lieb
[not found] ` <OF27E1911F.3FBABA22-ON87257B79.005C087F-88257B79.005C320B@us.ibm.com>
2013-05-29 0:57 ` Jim Lieb
[not found] ` <OF067A3B49.F63109B6-ON87257B7A.00137A60-88257B7A.00140BC7@us.ibm.com>
2013-05-29 10:06 ` Jeff Layton
2013-05-29 14:04 ` J. Bruce Fields
2013-06-04 15:38 ` [Lsf-pc] " Christoph Hellwig
2013-06-04 15:52 ` J. Bruce Fields
2013-05-29 16:52 ` Re: Re: " Jim Lieb
2013-05-28 20:00 ` Andreas Dilger
2013-05-28 20:11 ` Abhijith Das
2013-04-08 10:25 ` [LSF/MM TOPIC (expanded) 0/8] New API's for better exporting of VFS from user-mode daemons Steven Whitehouse
2013-04-08 10:25 ` [2/8] Sane locks (UNPOSIX locks) Boaz Harrosh
2013-04-08 12:02 ` [Lsf-pc] " Jeff Layton
2013-04-08 10:28 ` [3/8] File delegations, Usermode API of Bruce's pending patches Boaz Harrosh
2013-04-08 10:32 ` [4/8] PNFS ioctls/syscall Boaz Harrosh
2013-04-08 10:36 ` [5/8] syscall_cred() a system call that receives alternate CREDs Boaz Harrosh
2013-04-08 13:54 ` DENIEL Philippe
2013-04-08 14:42 ` J. Bruce Fields
2013-04-08 14:58 ` Boaz Harrosh
2013-04-08 18:23 ` Jim Lieb
2013-04-08 18:31 ` J. Bruce Fields
2013-04-08 19:45 ` Jim Lieb
2013-04-08 21:33 ` Boaz Harrosh
2013-04-09 16:40 ` Jim Lieb
2013-04-08 10:42 ` [6/8] Rich ACLs (continued, drive through this time) Boaz Harrosh
2013-04-08 11:12 ` Vyacheslav Dubeyko
2013-04-08 14:27 ` Venkateswararao Jujjuri
2013-04-08 10:43 ` [7/8] Single call interface to getattr/setattr Boaz Harrosh
[not found] ` <OF4A1A78E0.CB4DED3E-ON87257B47.00549E35-88257B47.005520A8@us.ibm.com>
2013-04-08 16:41 ` Boaz Harrosh
2013-04-08 10:45 ` [8/8] Fix fsnotify short comings (single fd with recursive notifications) Boaz Harrosh
2013-04-08 13:59 ` DENIEL Philippe
2013-04-08 15:22 ` Al Viro
2013-04-08 15:36 ` J. Bruce Fields
2013-04-08 14:31 ` [LSF/MM TOPIC (expanded) 0/8] New API's for better exporting of VFS from user-mode daemons Venkateswararao Jujjuri
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1784100361.2097625.1365447760580.JavaMail.root@redhat.com \
--to=adas@redhat.com \
--cc=bfields@redhat.com \
--cc=bharrosh@panasas.com \
--cc=ffilz@us.ibm.com \
--cc=jlayton@redhat.com \
--cc=jlieb@panasas.com \
--cc=jvrao@linux.vnet.ibm.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=nfs-ganesha-devel@lists.sourceforge.net \
--cc=philippe.deniel@cea.fr \
--cc=steved@redhat.com \
--cc=swhiteho@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).