From: Boaz Harrosh <bharrosh@panasas.com>
To: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Stable Tree <stable@kernel.org>, Greg KH <greg@kroah.com>
Cc: Jens Axboe <jaxboe@fusionio.com>, open-osd <osd-dev@open-osd.org>,
Chris Mason <chris.mason@oracle.com>,
Marc Dionne <marc.c.dionne@gmail.com>
Subject: Re: [PATCH] exofs: Fix read ahead BUG, caused by moving inodes to sb->s_bdi
Date: Thu, 03 Feb 2011 14:12:27 +0200 [thread overview]
Message-ID: <4D4A9BAB.4000407@panasas.com> (raw)
In-Reply-To: <4D4A8FDE.2030408@panasas.com>
On 02/03/2011 01:22 PM, Boaz Harrosh wrote:
>
> commit: [115e19c] exofs: Set i_mapping->backing_dev_info anyway
>
> Has caused a regression in read-ahead because bdi_init() does
> not set bdi->ra_pages to anything and it was left as zero.
> So when moving all inodes to point to the super-block's
> bdi, they all had ra_pages disabled.
>
> Should bdi_init() be patched to some sane default value?
>
> I fix that by calculating a read_ahead that is:
> - preferable 2 stripes long (I'll add a mount option to next Kernel)
> - Minimum 128K aligned up to stripe-size
> - Caped to maximum-IO-sizes round down to stripe_size.
> (Max sizes are governed by max bio size that fits a page times
> # of devices)
>
> This is a regression since the 2.6.37 Kernel.
> 2.6.36 was fine
>
Dear Greg, Stable Tree.
I see that Linus has already merged the revert of the offending
patch, as a BUG fix for above. That's fine. PLEASE forget about
this patch. (Sorry for the noise)
I will submit this patch as a regular enhancement for the
next merge window.
Thanks
Boaz
> CC: Stable Tree <stable@kernel.org>
> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
> ---
> fs/exofs/exofs.h | 2 ++
> fs/exofs/inode.c | 17 +++++++++++++----
> fs/exofs/super.c | 18 ++++++++++++++++++
> 3 files changed, 33 insertions(+), 4 deletions(-)
>
> diff --git a/fs/exofs/exofs.h b/fs/exofs/exofs.h
> index 2dc925f..99fcb91 100644
> --- a/fs/exofs/exofs.h
> +++ b/fs/exofs/exofs.h
> @@ -256,6 +256,8 @@ static inline int exofs_oi_read(struct exofs_i_info *oi,
> }
>
> /* inode.c */
> +unsigned exofs_max_io_pages(struct exofs_layout *layout,
> + unsigned expected_pages);
> int exofs_setattr(struct dentry *, struct iattr *);
> int exofs_write_begin(struct file *file, struct address_space *mapping,
> loff_t pos, unsigned len, unsigned flags,
> diff --git a/fs/exofs/inode.c b/fs/exofs/inode.c
> index 4268542..deede10 100644
> --- a/fs/exofs/inode.c
> +++ b/fs/exofs/inode.c
> @@ -43,6 +43,17 @@ enum { BIO_MAX_PAGES_KMALLOC =
> PAGE_SIZE / sizeof(struct page *),
> };
>
> +unsigned exofs_max_io_pages(struct exofs_layout *layout,
> + unsigned expected_pages)
> +{
> + unsigned pages = min_t(unsigned, expected_pages, MAX_PAGES_KMALLOC);
> +
> + /* TODO: easily support bio chaining */
> + pages = min_t(unsigned, pages,
> + layout->group_width * BIO_MAX_PAGES_KMALLOC);
> + return pages;
> +}
> +
> struct page_collect {
> struct exofs_sb_info *sbi;
> struct inode *inode;
> @@ -97,8 +108,7 @@ static void _pcol_reset(struct page_collect *pcol)
>
> static int pcol_try_alloc(struct page_collect *pcol)
> {
> - unsigned pages = min_t(unsigned, pcol->expected_pages,
> - MAX_PAGES_KMALLOC);
> + unsigned pages;
>
> if (!pcol->ios) { /* First time allocate io_state */
> int ret = exofs_get_io_state(&pcol->sbi->layout, &pcol->ios);
> @@ -108,8 +118,7 @@ static int pcol_try_alloc(struct page_collect *pcol)
> }
>
> /* TODO: easily support bio chaining */
> - pages = min_t(unsigned, pages,
> - pcol->sbi->layout.group_width * BIO_MAX_PAGES_KMALLOC);
> + pages = exofs_max_io_pages(&pcol->sbi->layout, pcol->expected_pages);
>
> for (; pages; pages >>= 1) {
> pcol->pages = kmalloc(pages * sizeof(struct page *),
> diff --git a/fs/exofs/super.c b/fs/exofs/super.c
> index 79c3ae6..75d42ac 100644
> --- a/fs/exofs/super.c
> +++ b/fs/exofs/super.c
> @@ -383,6 +383,23 @@ static int _read_and_match_data_map(struct exofs_sb_info *sbi, unsigned numdevs,
> return 0;
> }
>
> +static unsigned __ra_pages(struct exofs_layout *layout)
> +{
> + const unsigned _MIN_RA = 32; /* min 128K read-ahead */
> + unsigned ra_pages = layout->group_width * layout->stripe_unit /
> + PAGE_SIZE;
> + unsigned max_io_pages = exofs_max_io_pages(layout, ~0);
> +
> + ra_pages *= 2; /* two stripes */
> + if (ra_pages < _MIN_RA)
> + ra_pages = roundup(_MIN_RA, ra_pages / 2);
> +
> + if (ra_pages > max_io_pages)
> + ra_pages = max_io_pages;
> +
> + return ra_pages;
> +}
> +
> /* @odi is valid only as long as @fscb_dev is valid */
> static int exofs_devs_2_odi(struct exofs_dt_device_info *dt_dev,
> struct osd_dev_info *odi)
> @@ -616,6 +633,7 @@ static int exofs_fill_super(struct super_block *sb, void *data, int silent)
> }
>
> /* set up operation vectors */
> + sbi->bdi.ra_pages = __ra_pages(&sbi->layout);
> sb->s_bdi = &sbi->bdi;
> sb->s_fs_info = sbi;
> sb->s_op = &exofs_sops;
prev parent reply other threads:[~2011-02-03 12:12 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-02 16:48 Please Help: 2.6.37 a_ops->readpages is never called, 2.6.36 was fine Boaz Harrosh
2011-02-02 18:46 ` Problem found, but why??? Boaz Harrosh
2011-02-02 21:51 ` Chris Mason
2011-02-03 11:22 ` [PATCH] exofs: Fix read ahead BUG, caused by moving inodes to sb->s_bdi Boaz Harrosh
2011-02-03 12:12 ` Boaz Harrosh [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D4A9BAB.4000407@panasas.com \
--to=bharrosh@panasas.com \
--cc=chris.mason@oracle.com \
--cc=greg@kroah.com \
--cc=jaxboe@fusionio.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=marc.c.dionne@gmail.com \
--cc=osd-dev@open-osd.org \
--cc=stable@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.