From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Mingming Cao <cmm@us.ibm.com>
Cc: Theodore Tso <tytso@mit.edu>,
ext4 development <linux-ext4@vger.kernel.org>
Subject: Re: ENOSPC returned during writepages
Date: Thu, 21 Aug 2008 20:48:15 +0530 [thread overview]
Message-ID: <20080821151815.GD6509@skywalker> (raw)
In-Reply-To: <1219269325.7895.45.camel@mingming-laptop>
On Wed, Aug 20, 2008 at 02:55:25PM -0700, Mingming Cao wrote:
>
> 在 2008-08-20三的 07:53 -0400,Theodore Tso写道:
> > On Wed, Aug 20, 2008 at 04:16:44PM +0530, Aneesh Kumar K.V wrote:
> > > > mpage_da_map_blocks block allocation failed for inode 323784 at logical
> > > > offset 313 with max blocks 11 with error -28
> > > > This should not happen.!! Data will be lost
> >
> > We don't actually lose the data if free blocks are subsequently made
> > available, correct?
> >
>
> Well, I thought with Aneesh's new ext4_da_invalidate patch in the patch
> queue, the dirty page get invalidate if ext4_da_writepages() could not
> successfully map/allocate blocks. That means we lost data:(
>
> I have a feeling that we did not try very hard before invalidate the
> dirty page which fail to map to disks. Perhaps we should try a few more
> times before give up. Also in that case, perhaps we should turn off
> delalloc fs wide, so the new writers won't take the subsequently made
> avaible free blocks away from this unlucky delalloc da writepages.
How do we try hard ? The mballoc already try had to allocate blocks. So I
am not sure what do we achieve by requesting for block allocation again.
>
> > > I tried this patch. There are still multiple ways we can get wrong free
> > > block count. The patch reduced the number of errors. So we are doing
> > > better with patch. But I guess we can't use the percpu_counter based
> > > free block accounting with delalloc. Without delalloc it is ok even if
> > > we find some wrong free blocks count . The actual block allocation will fail in
> > > that case and we handle it perfectly fine. With delalloc we cannot
> > > afford to fail the block allocation. Should we look at a free block
> > > accounting rewrite using simple ext4_fsblk_t and and a spin lock ?
> >
> > It would be a shame if we did given that the whole point of the percpu
> > counter was to avoid a scalability bottleneck. Perhaps we could take
> > a filesystem-level spinlock only when the number of free blocks as
> > reported by the percpu_counter falls below some critical level?
>
> Perhaps the thresh hold should b higher, but other than that, the
> current ext4_has_free_blocks() code, does 1) get the freeblocks counter
> 2) if the counter < FBC_BATCH , it will call
> percpu_counter_sum_and_set(), which will take the per-cpu-counter lock,
> and do accurate accounting.
>
> So after think again, I could not see what suggested above diffrent from
> what current ext4_has_free_blocks() does?
>
>
> Right now the ext4_has_free_blocks() uses the
>
> #define FBC_BATCH (NR_CPUS*4)
>
> as the thresh hold. I thought that was good enough as
> ext4_da_reserve_space() only request 1 block at a time (called at
> write_begin time), but maybe I am wrong...
>
I have right now threshold check as below.
+ /* Each CPU can accumulate FBC_BATCH blocks in their local
+ * counters. So we need to make sure we have free blocks more
+ * than FBC_BATCH * nr_cpu_ids. Also add a window of 4 times.
+ */
+ if (free_blocks - (nblocks + root_blocks) <
+ (4 * (FBC_BATCH * nr_cpu_ids)))
{
-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2008-08-21 15:18 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-20 5:43 ENOSPC returned during writepages Aneesh Kumar K.V
2008-08-20 10:46 ` Aneesh Kumar K.V
2008-08-20 11:53 ` Theodore Tso
2008-08-20 18:27 ` Aneesh Kumar K.V
2008-08-20 21:35 ` Mingming Cao
2008-08-21 15:15 ` Aneesh Kumar K.V
2008-08-20 19:25 ` Andreas Dilger
2008-08-20 19:34 ` Theodore Tso
2008-08-20 20:56 ` Mingming Cao
2008-08-20 21:55 ` Theodore Tso
2008-08-20 22:02 ` Mingming Cao
2008-08-20 23:22 ` Mingming Cao
2008-08-20 23:42 ` Andreas Dilger
2008-08-20 23:58 ` Mingming Cao
2008-08-21 1:44 ` Andreas Dilger
2008-08-20 21:55 ` Mingming Cao
2008-08-21 15:18 ` Aneesh Kumar K.V [this message]
2008-08-21 15:35 ` Theodore Tso
2008-08-21 17:17 ` Mingming Cao
2008-08-23 11:12 ` Andreas Dilger
2008-08-21 15:12 ` Aneesh Kumar K.V
2008-08-21 16:56 ` Mingming Cao
2008-08-20 21:58 ` Mingming Cao
2008-08-21 15:09 ` Aneesh Kumar K.V
2008-08-21 5:06 ` Eric Sandeen
2008-08-21 16:45 ` Aneesh Kumar K.V
2008-08-21 17:07 ` Mingming Cao
2008-08-21 17:31 ` Aneesh Kumar K.V
2008-08-21 18:06 ` Mingming Cao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080821151815.GD6509@skywalker \
--to=aneesh.kumar@linux.vnet.ibm.com \
--cc=cmm@us.ibm.com \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox