linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] can not allocate space for caching data
@ 2010-12-20 12:25 Miao Xie
  2010-12-20 12:44 ` Chris Mason
  0 siblings, 1 reply; 5+ messages in thread
From: Miao Xie @ 2010-12-20 12:25 UTC (permalink / raw)
  To: Chris Mason; +Cc: Linux Btrfs

Hi, Chris

There is something wrong with this patch:

commit 83a50de97fe96aca82389e061862ed760ece2283
Author: Chris Mason <chris.mason@oracle.com>
Date:   Mon Dec 13 15:06:46 2010 -0500

    Btrfs: prevent RAID level downgrades when space is low
    
    The extent allocator has code that allows us to fill
    allocations from any available block group, even if it doesn't
    match the raid level we've requested.
    
    This was put in because adding a new drive to a filesystem
    made with the default mkfs options actually upgrades the metadata from
    single spindle dup to full RAID1.
    
    But, the code also allows us to allocate from a raid0 chunk when we
    really want a raid1 or raid10 chunk.  This can cause big trouble because
    mkfs creates a small (4MB) raid0 chunk for data and metadata which then
    goes unused for raid1/raid10 installs.
    
    The allocator will happily wander in and allocate from that chunk when
    things get tight, which is not correct.
    
    The fix here is to make sure that we provide duplication when the
    caller has asked for it.  It does all the dups to be any raid level,
    which preserves the dup->raid1 upgrade abilities.
    
    Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs has added the space of single chunks and raid0 chunks into the space
information, so when we use btrfs_check_data_free_space() to check if there
is some space for storing file data, this function may return true. So we
write the data into the cache successfully. But, the extent allocator can
not allocate any space to store that cached data, and then the file system
panic.

I think we subtract that space from the space information, or split the space
information into two types, one is used to manage the chunks with duplication,
the other manages the other chunks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] can not allocate space for caching data
  2010-12-20 12:25 [BUG] can not allocate space for caching data Miao Xie
@ 2010-12-20 12:44 ` Chris Mason
  2010-12-20 13:13   ` Miao Xie
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Mason @ 2010-12-20 12:44 UTC (permalink / raw)
  To: Miao Xie; +Cc: Linux Btrfs

Excerpts from Miao Xie's message of 2010-12-20 07:25:10 -0500:
> Hi, Chris
> 
> There is something wrong with this patch:
> 
> commit 83a50de97fe96aca82389e061862ed760ece2283
> Author: Chris Mason <chris.mason@oracle.com>
> Date:   Mon Dec 13 15:06:46 2010 -0500
> 
>     Btrfs: prevent RAID level downgrades when space is low
>     
>     The extent allocator has code that allows us to fill
>     allocations from any available block group, even if it doesn't
>     match the raid level we've requested.
>     
>     This was put in because adding a new drive to a filesystem
>     made with the default mkfs options actually upgrades the metadata from
>     single spindle dup to full RAID1.
>     
>     But, the code also allows us to allocate from a raid0 chunk when we
>     really want a raid1 or raid10 chunk.  This can cause big trouble because
>     mkfs creates a small (4MB) raid0 chunk for data and metadata which then
>     goes unused for raid1/raid10 installs.
>     
>     The allocator will happily wander in and allocate from that chunk when
>     things get tight, which is not correct.
>     
>     The fix here is to make sure that we provide duplication when the
>     caller has asked for it.  It does all the dups to be any raid level,
>     which preserves the dup->raid1 upgrade abilities.
>     
>     Signed-off-by: Chris Mason <chris.mason@oracle.com>
> 
> Btrfs has added the space of single chunks and raid0 chunks into the space
> information, so when we use btrfs_check_data_free_space() to check if there
> is some space for storing file data, this function may return true. So we
> write the data into the cache successfully. But, the extent allocator can
> not allocate any space to store that cached data, and then the file system
> panic.
> 
> I think we subtract that space from the space information, or split the space
> information into two types, one is used to manage the chunks with duplication,
> the other manages the other chunks.

Ok, do you have a test case that triggers this?  I'll work out a patch.
Yan Zheng's original idea of 'the chunks should be readonly' should help
us deduct them from the total.

-chris

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] can not allocate space for caching data
  2010-12-20 12:44 ` Chris Mason
@ 2010-12-20 13:13   ` Miao Xie
  2010-12-20 15:41     ` Chris Mason
  0 siblings, 1 reply; 5+ messages in thread
From: Miao Xie @ 2010-12-20 13:13 UTC (permalink / raw)
  To: Chris Mason; +Cc: Linux Btrfs

On Mon, 20 Dec 2010 07:44:06 -0500, Chris Mason wrote:
> Excerpts from Miao Xie's message of 2010-12-20 07:25:10 -0500:
>> Hi, Chris
>>
>> There is something wrong with this patch:
>>
>> commit 83a50de97fe96aca82389e061862ed760ece2283
>> Author: Chris Mason<chris.mason@oracle.com>
>> Date:   Mon Dec 13 15:06:46 2010 -0500
>>
>>      Btrfs: prevent RAID level downgrades when space is low
>>
>>      The extent allocator has code that allows us to fill
>>      allocations from any available block group, even if it doesn't
>>      match the raid level we've requested.
>>
>>      This was put in because adding a new drive to a filesystem
>>      made with the default mkfs options actually upgrades the metadata from
>>      single spindle dup to full RAID1.
>>
>>      But, the code also allows us to allocate from a raid0 chunk when we
>>      really want a raid1 or raid10 chunk.  This can cause big trouble because
>>      mkfs creates a small (4MB) raid0 chunk for data and metadata which then
>>      goes unused for raid1/raid10 installs.
>>
>>      The allocator will happily wander in and allocate from that chunk when
>>      things get tight, which is not correct.
>>
>>      The fix here is to make sure that we provide duplication when the
>>      caller has asked for it.  It does all the dups to be any raid level,
>>      which preserves the dup->raid1 upgrade abilities.
>>
>>      Signed-off-by: Chris Mason<chris.mason@oracle.com>
>>
>> Btrfs has added the space of single chunks and raid0 chunks into the space
>> information, so when we use btrfs_check_data_free_space() to check if there
>> is some space for storing file data, this function may return true. So we
>> write the data into the cache successfully. But, the extent allocator can
>> not allocate any space to store that cached data, and then the file system
>> panic.
>>
>> I think we subtract that space from the space information, or split the space
>> information into two types, one is used to manage the chunks with duplication,
>> the other manages the other chunks.
>
> Ok, do you have a test case that triggers this?  I'll work out a patch.
> Yan Zheng's original idea of 'the chunks should be readonly' should help
> us deduct them from the total.

# mkfs.btrfs -d raid1 /dev/sda9 /dev/sda10
# mount /dev/sda9 /mnt
# dd if=/dev/zero of=/mnt/tmpfile0 bs=4K count=999999999999999999
   (fill the file system)
# umount /mnt
# mount /dev/sda9 /mnt
# dd if=/dev/zero of=/mnt/tmpfile1 bs=4K count=1000
# sync

Thanks
Miao


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] can not allocate space for caching data
  2010-12-20 13:13   ` Miao Xie
@ 2010-12-20 15:41     ` Chris Mason
  2010-12-21  0:33       ` Yan, Zheng 
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Mason @ 2010-12-20 15:41 UTC (permalink / raw)
  To: miaox; +Cc: Linux Btrfs, Yan, Zheng

Excerpts from Miao Xie's message of 2010-12-20 08:13:14 -0500:
> On Mon, 20 Dec 2010 07:44:06 -0500, Chris Mason wrote:
> > Excerpts from Miao Xie's message of 2010-12-20 07:25:10 -0500:
> >> Hi, Chris
> >>
> >> There is something wrong with this patch:
> >>
> >> commit 83a50de97fe96aca82389e061862ed760ece2283
> >> Author: Chris Mason<chris.mason@oracle.com>
> >> Date:   Mon Dec 13 15:06:46 2010 -0500
> >>
> >>      Btrfs: prevent RAID level downgrades when space is low
> >>
> >>      The extent allocator has code that allows us to fill
> >>      allocations from any available block group, even if it doesn't
> >>      match the raid level we've requested.
> >>
> >> Btrfs has added the space of single chunks and raid0 chunks into the space
> >> information, so when we use btrfs_check_data_free_space() to check if there
> >> is some space for storing file data, this function may return true. So we
> >> write the data into the cache successfully. But, the extent allocator can
> >> not allocate any space to store that cached data, and then the file system
> >> panic.
> >>
> >> I think we subtract that space from the space information, or split the space
> >> information into two types, one is used to manage the chunks with duplication,
> >> the other manages the other chunks.
> >
> > Ok, do you have a test case that triggers this?  I'll work out a patch.
> > Yan Zheng's original idea of 'the chunks should be readonly' should help
> > us deduct them from the total.
> 
> # mkfs.btrfs -d raid1 /dev/sda9 /dev/sda10
> # mount /dev/sda9 /mnt
> # dd if=/dev/zero of=/mnt/tmpfile0 bs=4K count=999999999999999999
>    (fill the file system)
> # umount /mnt
> # mount /dev/sda9 /mnt
> # dd if=/dev/zero of=/mnt/tmpfile1 bs=4K count=1000
> # sync

Looks like we've got an off by one bug in set_block_group_ro, which is
why our block group isn't getting set to ro.  With this patch, we're
properly setting the block group ro, and the enospc accounting is done
correctly.

It should also be able to replace my commit above.  Please take a look,
Zheng does this look correct to you?

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 227e581..6f7d758 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7970,13 +7970,14 @@ static int set_block_group_ro(struct btrfs_block_group_cache *cache)
 
 	if (sinfo->bytes_used + sinfo->bytes_reserved + sinfo->bytes_pinned +
 	    sinfo->bytes_may_use + sinfo->bytes_readonly +
-	    cache->reserved_pinned + num_bytes < sinfo->total_bytes) {
+	    cache->reserved_pinned + num_bytes <= sinfo->total_bytes) {
 		sinfo->bytes_readonly += num_bytes;
 		sinfo->bytes_reserved += cache->reserved_pinned;
 		cache->reserved_pinned = 0;
 		cache->ro = 1;
 		ret = 0;
 	}
+
 	spin_unlock(&cache->lock);
 	spin_unlock(&sinfo->lock);
 	return ret;

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [BUG] can not allocate space for caching data
  2010-12-20 15:41     ` Chris Mason
@ 2010-12-21  0:33       ` Yan, Zheng 
  0 siblings, 0 replies; 5+ messages in thread
From: Yan, Zheng  @ 2010-12-21  0:33 UTC (permalink / raw)
  To: Chris Mason; +Cc: miaox, Linux Btrfs

On Mon, Dec 20, 2010 at 11:41 PM, Chris Mason <chris.mason@oracle.com> =
wrote:
> Excerpts from Miao Xie's message of 2010-12-20 08:13:14 -0500:
>> On Mon, 20 Dec 2010 07:44:06 -0500, Chris Mason wrote:
>> > Excerpts from Miao Xie's message of 2010-12-20 07:25:10 -0500:
>> >> Hi, Chris
>> >>
>> >> There is something wrong with this patch:
>> >>
>> >> commit 83a50de97fe96aca82389e061862ed760ece2283
>> >> Author: Chris Mason<chris.mason@oracle.com>
>> >> Date: =A0 Mon Dec 13 15:06:46 2010 -0500
>> >>
>> >> =A0 =A0 =A0Btrfs: prevent RAID level downgrades when space is low
>> >>
>> >> =A0 =A0 =A0The extent allocator has code that allows us to fill
>> >> =A0 =A0 =A0allocations from any available block group, even if it=
 doesn't
>> >> =A0 =A0 =A0match the raid level we've requested.
>> >>
>> >> Btrfs has added the space of single chunks and raid0 chunks into =
the space
>> >> information, so when we use btrfs_check_data_free_space() to chec=
k if there
>> >> is some space for storing file data, this function may return tru=
e. So we
>> >> write the data into the cache successfully. But, the extent alloc=
ator can
>> >> not allocate any space to store that cached data, and then the fi=
le system
>> >> panic.
>> >>
>> >> I think we subtract that space from the space information, or spl=
it the space
>> >> information into two types, one is used to manage the chunks with=
 duplication,
>> >> the other manages the other chunks.
>> >
>> > Ok, do you have a test case that triggers this? =A0I'll work out a=
 patch.
>> > Yan Zheng's original idea of 'the chunks should be readonly' shoul=
d help
>> > us deduct them from the total.
>>
>> # mkfs.btrfs -d raid1 /dev/sda9 /dev/sda10
>> # mount /dev/sda9 /mnt
>> # dd if=3D/dev/zero of=3D/mnt/tmpfile0 bs=3D4K count=3D9999999999999=
99999
>> =A0 =A0(fill the file system)
>> # umount /mnt
>> # mount /dev/sda9 /mnt
>> # dd if=3D/dev/zero of=3D/mnt/tmpfile1 bs=3D4K count=3D1000
>> # sync
>
> Looks like we've got an off by one bug in set_block_group_ro, which i=
s
> why our block group isn't getting set to ro. =A0With this patch, we'r=
e
> properly setting the block group ro, and the enospc accounting is don=
e
> correctly.
>
> It should also be able to replace my commit above. =A0Please take a l=
ook,
> Zheng does this look correct to you?
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 227e581..6f7d758 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -7970,13 +7970,14 @@ static int set_block_group_ro(struct btrfs_bl=
ock_group_cache *cache)
>
> =A0 =A0 =A0 =A0if (sinfo->bytes_used + sinfo->bytes_reserved + sinfo-=
>bytes_pinned +
> =A0 =A0 =A0 =A0 =A0 =A0sinfo->bytes_may_use + sinfo->bytes_readonly +
> - =A0 =A0 =A0 =A0 =A0 cache->reserved_pinned + num_bytes < sinfo->tot=
al_bytes) {
> + =A0 =A0 =A0 =A0 =A0 cache->reserved_pinned + num_bytes <=3D sinfo->=
total_bytes) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0sinfo->bytes_readonly +=3D num_bytes;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0sinfo->bytes_reserved +=3D cache->rese=
rved_pinned;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0cache->reserved_pinned =3D 0;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0cache->ro =3D 1;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ret =3D 0;
> =A0 =A0 =A0 =A0}
> +
> =A0 =A0 =A0 =A0spin_unlock(&cache->lock);
> =A0 =A0 =A0 =A0spin_unlock(&sinfo->lock);
> =A0 =A0 =A0 =A0return ret;
>

Looks good for me,

Yan, Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-12-21  0:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-20 12:25 [BUG] can not allocate space for caching data Miao Xie
2010-12-20 12:44 ` Chris Mason
2010-12-20 13:13   ` Miao Xie
2010-12-20 15:41     ` Chris Mason
2010-12-21  0:33       ` Yan, Zheng 

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).