* Hot topics for the next release @ 2008-08-06 14:21 Chris Mason 2008-08-06 14:58 ` David Woodhouse 2008-08-06 15:42 ` jim owens 0 siblings, 2 replies; 11+ messages in thread From: Chris Mason @ 2008-08-06 14:21 UTC (permalink / raw) To: linux-btrfs Hello everyone, Now that v0.16 is out the door, I'd like to get a thread going on topics people are interested in tackling next. The top of my list looks like this: Improved allocator threading Better in-memory free space indexing (Josef) Better fsync performance (Chris) Improved offline fsck NFS Support O_DIRECT support But anything on the development timeline is fair game. I'm shooting for a smaller number of changes this time around so a new release can be cut before the kernel summit and plumber's conference in mid-September. -chris ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Hot topics for the next release 2008-08-06 14:21 Hot topics for the next release Chris Mason @ 2008-08-06 14:58 ` David Woodhouse 2008-08-06 15:13 ` Chris Mason 2008-08-06 15:42 ` jim owens 1 sibling, 1 reply; 11+ messages in thread From: David Woodhouse @ 2008-08-06 14:58 UTC (permalink / raw) To: Chris Mason; +Cc: linux-btrfs On Wed, 2008-08-06 at 10:21 -0400, Chris Mason wrote: > NFS Support This is basically ready. All you need in btrfs is the two patches from Balaji Rao, which I've updated to apply to the 0.16 and put in git.infradead.org/users/dwmw2/btrfs-kernel-unstable.git (along with a build fix for 2.6.27-rc2, which is also below). The rest of it is a generic problem with NFSD, for which the (current) fix is at git.infradead.org/users/dwmw2/nfsexport-2.6.git You could perhaps copy the readdir hack into btrfs code for use with obsolete kernels -- but to be honest I'd be inclined to leave that for the masochists^Wenterprise folks. >From 6c5f1012ccb1bb8a55dc9e564db3ca15d893763b Mon Sep 17 00:00:00 2001 From: David Woodhouse <David.Woodhouse@intel.com> Date: Wed, 6 Aug 2008 15:54:51 +0100 Subject: [PATCH] Change TestSetPageLocked() to trylock_page() Add backwards compatibility in compat.h Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> --- compat.h | 3 +++ extent_io.c | 3 ++- 2 files changed, 5 insertions(+), 1 deletions(-) diff --git a/compat.h b/compat.h index d39a768..b3349a6 100644 --- a/compat.h +++ b/compat.h @@ -1,6 +1,9 @@ #ifndef _COMPAT_H_ #define _COMPAT_H_ +#if LINUX_VERSION_CODE <= KERNEL_VERSION(2,6,26) +#define trylock_page(page) (!TestSetPageLocked(page)) +#endif /* * Even if AppArmor isn't enabled, it still has different prototypes. diff --git a/extent_io.c b/extent_io.c index 1cf4bab..f46f886 100644 --- a/extent_io.c +++ b/extent_io.c @@ -14,6 +14,7 @@ #include <linux/pagevec.h> #include "extent_io.h" #include "extent_map.h" +#include "compat.h" /* temporary define until extent_map moves out of btrfs */ struct kmem_cache *btrfs_cache_create(const char *name, size_t size, @@ -3055,7 +3056,7 @@ int read_extent_buffer_pages(struct extent_io_tree *tree, for (i = start_i; i < num_pages; i++) { page = extent_buffer_page(eb, i); if (!wait) { - if (TestSetPageLocked(page)) + if (!trylock_page(page)) goto unlock_exit; } else { lock_page(page); -- 1.5.5.1 -- David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Hot topics for the next release 2008-08-06 14:58 ` David Woodhouse @ 2008-08-06 15:13 ` Chris Mason 2008-08-06 15:26 ` Toei Rei 2008-08-06 18:45 ` David Woodhouse 0 siblings, 2 replies; 11+ messages in thread From: Chris Mason @ 2008-08-06 15:13 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-btrfs On Wed, 2008-08-06 at 15:58 +0100, David Woodhouse wrote: > On Wed, 2008-08-06 at 10:21 -0400, Chris Mason wrote: > > NFS Support > > This is basically ready. All you need in btrfs is the two patches from > Balaji Rao, which I've updated to apply to the 0.16 and put in > git.infradead.org/users/dwmw2/btrfs-kernel-unstable.git (along with a > build fix for 2.6.27-rc2, which is also below). > > The rest of it is a generic problem with NFSD, for which the (current) > fix is at git.infradead.org/users/dwmw2/nfsexport-2.6.git > > You could perhaps copy the readdir hack into btrfs code for use with > obsolete kernels -- but to be honest I'd be inclined to leave that for > the masochists^Wenterprise folks. > We do need the readdir hack, being able to test on older kernels (say 2.6.26) is a big part of attracting and keeping btrfs testers. Thanks for the trylock_page, I'll toss it in. -chris ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Hot topics for the next release 2008-08-06 15:13 ` Chris Mason @ 2008-08-06 15:26 ` Toei Rei 2008-08-06 18:45 ` David Woodhouse 1 sibling, 0 replies; 11+ messages in thread From: Toei Rei @ 2008-08-06 15:26 UTC (permalink / raw) To: linux-btrfs >> You could perhaps copy the readdir hack into btrfs code for use with >> obsolete kernels -- but to be honest I'd be inclined to leave that for >> the masochists^Wenterprise folks. >> > > We do need the readdir hack, being able to test on older kernels (say > 2.6.26) is a big part of attracting and keeping btrfs testers. > Guess you're talking about the saddistic people like me? Rei ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Hot topics for the next release 2008-08-06 15:13 ` Chris Mason 2008-08-06 15:26 ` Toei Rei @ 2008-08-06 18:45 ` David Woodhouse 1 sibling, 0 replies; 11+ messages in thread From: David Woodhouse @ 2008-08-06 18:45 UTC (permalink / raw) To: Chris Mason; +Cc: linux-btrfs On Wed, 2008-08-06 at 11:13 -0400, Chris Mason wrote: > We do need the readdir hack, being able to test on older kernels (say > 2.6.26) is a big part of attracting and keeping btrfs testers. Well, those testers don't seem to have been put off so far by the fact that you can't export it by NFS. But it's easy enough to copy it over. Added to git.infradead.org/users/dwmw2/btrfs-kernel-unstable.git From: David Woodhouse <David.Woodhouse@intel.com> Date: Wed, 6 Aug 2008 19:42:33 +0100 Subject: [PATCH] Implement our own copy of the nfsd readdir hack, for older kernels Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> --- ctree.h | 4 ++ export.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ inode.c | 8 ++++- 3 files changed, 104 insertions(+), 2 deletions(-) diff --git a/ctree.h b/ctree.h index 3694f03..7200178 100644 --- a/ctree.h +++ b/ctree.h @@ -1694,6 +1694,7 @@ void btrfs_destroy_inode(struct inode *inode); int btrfs_init_cachep(void); void btrfs_destroy_cachep(void); long btrfs_ioctl_trans_end(struct file *file); +int btrfs_real_readdir(struct file *filp, void *dirent, filldir_t filldir); struct inode *btrfs_iget_locked(struct super_block *s, u64 objectid, struct btrfs_root *root); struct inode *btrfs_ilookup(struct super_block *s, u64 objectid, @@ -1709,6 +1710,9 @@ int btrfs_update_inode(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct inode *inode); +/* export.c */ +int btrfs_nfshack_readdir(struct file *filp, void *dirent, filldir_t filldir); + /* ioctl.c */ long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg); diff --git a/export.c b/export.c index 9070674..d152fbc 100644 --- a/export.c +++ b/export.c @@ -181,3 +181,97 @@ const struct export_operations btrfs_export_ops = { .fh_to_parent = btrfs_fh_to_parent, .get_parent = btrfs_get_parent, }; + +/* Kernels without FS_LOOKUP_IN_READDIR still have the NFS deadlock where + nfsd will call the file system's ->lookup() method from within its + filldir callback, which in turn was called from the file system's + ->readdir() method. And will deadlock for many file systems. */ +#ifndef FS_LOOKUP_IN_READDIR + +struct nfshack_dirent { + u64 ino; + loff_t offset; + int namlen; + unsigned int d_type; + char name[]; +}; + +struct nfshack_readdir { + char *dirent; + size_t used; +}; + + + +static int btrfs_nfshack_filldir(void *__buf, const char *name, int namlen, + loff_t offset, u64 ino, unsigned int d_type) +{ + struct nfshack_readdir *buf = __buf; + struct nfshack_dirent *de = (void *)(buf->dirent + buf->used); + unsigned int reclen; + + reclen = ALIGN(sizeof(struct nfshack_dirent) + namlen, sizeof(u64)); + if (buf->used + reclen > PAGE_SIZE) + return -EINVAL; + + de->namlen = namlen; + de->offset = offset; + de->ino = ino; + de->d_type = d_type; + memcpy(de->name, name, namlen); + buf->used += reclen; + + return 0; +} + +int btrfs_nfshack_readdir(struct file *file, void *dirent, filldir_t filldir) +{ + struct nfshack_readdir buf; + struct nfshack_dirent *de; + int err; + int size; + loff_t offset; + + buf.dirent = (void *)__get_free_page(GFP_KERNEL); + if (!buf.dirent) + return -ENOMEM; + + offset = file->f_pos; + + while (1) { + unsigned int reclen; + + buf.used = 0; + + err = btrfs_real_readdir(file, &buf, btrfs_nfshack_filldir); + if (err) + break; + + size = buf.used; + + if (!size) + break; + + de = (struct nfshack_dirent *)buf.dirent; + while (size > 0) { + offset = de->offset; + + if (filldir(dirent, de->name, de->namlen, de->offset, + de->ino, de->d_type)) + goto done; + offset = file->f_pos; + + reclen = ALIGN(sizeof(*de) + de->namlen, + sizeof(u64)); + size -= reclen; + de = (struct nfshack_dirent *)((char *)de + reclen); + } + } + + done: + free_page((unsigned long)buf.dirent); + file->f_pos = offset; + + return err; +} +#endif diff --git a/inode.c b/inode.c index 393b7aa..f8b3fde 100644 --- a/inode.c +++ b/inode.c @@ -1956,7 +1956,7 @@ static unsigned char btrfs_filetype_table[] = { DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK }; -static int btrfs_readdir(struct file *filp, void *dirent, filldir_t filldir) +int btrfs_real_readdir(struct file *filp, void *dirent, filldir_t filldir) { struct inode *inode = filp->f_dentry->d_inode; struct btrfs_root *root = BTRFS_I(inode)->root; @@ -3661,7 +3661,11 @@ static struct inode_operations btrfs_dir_ro_inode_operations = { static struct file_operations btrfs_dir_file_operations = { .llseek = generic_file_llseek, .read = generic_read_dir, - .readdir = btrfs_readdir, +#ifdef FS_LOOKUP_IN_READDIR /* NFSd readdir/lookup deadlock is fixed */ + .readdir = btrfs_real_readdir, +#else /* otherwise, we need to work around it ourselves */ + .readdir = btrfs_nfshack_readdir, +#endif .unlocked_ioctl = btrfs_ioctl, #ifdef CONFIG_COMPAT .compat_ioctl = btrfs_ioctl, -- 1.5.5.1 -- David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Hot topics for the next release 2008-08-06 14:21 Hot topics for the next release Chris Mason 2008-08-06 14:58 ` David Woodhouse @ 2008-08-06 15:42 ` jim owens 2008-08-06 16:36 ` Chris Mason 1 sibling, 1 reply; 11+ messages in thread From: jim owens @ 2008-08-06 15:42 UTC (permalink / raw) To: Chris Mason; +Cc: linux-btrfs > Improved allocator threading I wanted to work on the allocator with a larger scope where threading is only a minor part of trying to address these items from the Project_ideas that I think could change disk format in some way (to fix it before v1.0): - Different sector sizes - Multiple chunk trees and extent allocation trees - Limiting btree failure domains and maybe impacting this from Development_timeline - Reserved space for online fsck and the ability to add storage so that a background extent allocation check can proceed Maybe this is too ambitious or I am seeing intersections that are not there, but I am prepared to try doing the allocator. jim P.S. Are there other V1.0 format issues to lock down that should be worked before the missing features like O_DIRECT? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Hot topics for the next release 2008-08-06 15:42 ` jim owens @ 2008-08-06 16:36 ` Chris Mason 2008-08-06 20:36 ` jim owens 0 siblings, 1 reply; 11+ messages in thread From: Chris Mason @ 2008-08-06 16:36 UTC (permalink / raw) To: jim owens; +Cc: linux-btrfs On Wed, 2008-08-06 at 11:42 -0400, jim owens wrote: > > Improved allocator threading > > I wanted to work on the allocator with a larger scope > where threading is only a minor part of trying to address Josef's allocator fix is on the list because we currently fall over in some workloads at 100% cpu time when the FS is 60% full. The space indexing is complex and strange, it just needs to be redone. > these items from the Project_ideas that I think could change > disk format in some way (to fix it before v1.0): > - Different sector sizes Sector alignment and sector sizes definitely need to happen before 1.0 > - Multiple chunk trees and extent allocation trees For these I was planning on only adding the disk format bits needed and leaving the code alone. > - Limiting btree failure domains > and maybe impacting this from Development_timeline > - Reserved space for online fsck and the ability to add > storage so that a background extent allocation check can proceed The reserved space is important as well. > > Maybe this is too ambitious or I am seeing intersections that > are not there, but I am prepared to try doing the allocator. > I'd love to have help on all of the above, and you're welcome to dive in and give it a shot. I'd say to pick one though, starting with smaller patches is going to be a good idea. > jim > > P.S. Are there other V1.0 format issues to lock down that > should be worked before the missing features like O_DIRECT? Yes, I'm trying to walk the line between having enough performance for people to do baseline tests (the results of which may force disk format changes) and pushing out the disk format changes. So, things that are very well understood like multiple copies of the super block or compat flags, I'm pushing off. -chris ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Hot topics for the next release 2008-08-06 16:36 ` Chris Mason @ 2008-08-06 20:36 ` jim owens 2008-08-06 20:43 ` Chris Mason 0 siblings, 1 reply; 11+ messages in thread From: jim owens @ 2008-08-06 20:36 UTC (permalink / raw) To: Chris Mason; +Cc: linux-btrfs Chris Mason wrote: > Josef's allocator fix is on the list because we currently fall over in > some workloads at 100% cpu time when the FS is 60% full. The space > indexing is complex and strange, it just needs to be redone. I don't understand. Do you mean josef wants someone to fix multithreading or do you mean he is doing that as part of an allocator fix he is working on, or did you mean that his work is an exception and you really were not looking at doing any allocator changes? I see this in your 0.16 list: > Better in-memory free space indexing (Josef) but did not tie it to the allocator. Is that it? and I'll add an area I missed before in the Development_timeline: - Fallocate support (at least disk format level) as part of the allocator bundle-of-too-much-work :) jim ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Hot topics for the next release 2008-08-06 20:36 ` jim owens @ 2008-08-06 20:43 ` Chris Mason 2008-08-06 20:49 ` Joe Peterson 0 siblings, 1 reply; 11+ messages in thread From: Chris Mason @ 2008-08-06 20:43 UTC (permalink / raw) To: jim owens; +Cc: linux-btrfs On Wed, 2008-08-06 at 16:36 -0400, jim owens wrote: > Chris Mason wrote: > > > Josef's allocator fix is on the list because we currently fall over in > > some workloads at 100% cpu time when the FS is 60% full. The space > > indexing is complex and strange, it just needs to be redone. > > I don't understand. Do you mean josef wants someone to fix > multithreading or do you mean he is doing that as part of > an allocator fix he is working on, or did you mean that > his work is an exception and you really were not looking at > doing any allocator changes? > Josef is fixing the way the allocator indexes free space in ram. This is different from working on the threading, but I'm holding off on the threading until after he is done. > I see this in your 0.16 list: > > > Better in-memory free space indexing (Josef) > > but did not tie it to the allocator. Is that it? > > and I'll add an area I missed before in the Development_timeline: > - Fallocate support (at least disk format level) > as part of the allocator bundle-of-too-much-work :) ;) fallocate would be important as well. -chris ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Hot topics for the next release 2008-08-06 20:43 ` Chris Mason @ 2008-08-06 20:49 ` Joe Peterson 2008-08-06 20:53 ` Chris Mason 0 siblings, 1 reply; 11+ messages in thread From: Joe Peterson @ 2008-08-06 20:49 UTC (permalink / raw) To: Chris Mason; +Cc: jim owens, linux-btrfs >> Chris Mason wrote: >> >>> Josef's allocator fix is on the list because we currently fall over in >>> some workloads at 100% cpu time when the FS is 60% full. Chris, does it oops, or just get very slow? Does 0.15 do the same? -Joe ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Hot topics for the next release 2008-08-06 20:49 ` Joe Peterson @ 2008-08-06 20:53 ` Chris Mason 0 siblings, 0 replies; 11+ messages in thread From: Chris Mason @ 2008-08-06 20:53 UTC (permalink / raw) To: Joe Peterson; +Cc: jim owens, linux-btrfs On Wed, 2008-08-06 at 14:49 -0600, Joe Peterson wrote: > >> Chris Mason wrote: > >> > >>> Josef's allocator fix is on the list because we currently fall over in > >>> some workloads at 100% cpu time when the FS is 60% full. > > Chris, does it oops, or just get very slow? Does 0.15 do the same? > Very very slow and v0.15 has the same feature. This doesn't happen every time you hit 60% full, it varies with the workload. -chris ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2008-08-06 20:53 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-08-06 14:21 Hot topics for the next release Chris Mason 2008-08-06 14:58 ` David Woodhouse 2008-08-06 15:13 ` Chris Mason 2008-08-06 15:26 ` Toei Rei 2008-08-06 18:45 ` David Woodhouse 2008-08-06 15:42 ` jim owens 2008-08-06 16:36 ` Chris Mason 2008-08-06 20:36 ` jim owens 2008-08-06 20:43 ` Chris Mason 2008-08-06 20:49 ` Joe Peterson 2008-08-06 20:53 ` Chris Mason
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox