linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 1/2] Btrfs: try to allocate new chunks with degenerated profile
@ 2012-01-17 10:02 Miao Xie
  2012-01-17 20:58 ` Chris Mason
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Miao Xie @ 2012-01-17 10:02 UTC (permalink / raw)
  To: Linux Btrfs

If there is no free space, the free space allocator will try to get space from
the block group with the degenerated profile. For example, if there is no free
space in the RAID1 block groups, the allocator will try to allocate space from
the DUP block groups. And besides that, the space reservation has the similar
behaviour: if there is no enough space in the space cache to reserve, it will
reserve the space according to the disk space, and it just take mirror storage
into account, no RAID0, RAID1, or RAID10.

So we'd better make the behaviour of chunk allocation correspond with space
reservation and free space allocation, if there is no enough disk space to
allocate RAID(RAID0, RAID1, RAID10) chunks, we degenerate the profile and try
to allocate chunks again. Otherwise, enospc will happen though we reserve
the space successfully and BUG_ON() will be triggered.

Degenerating rule:
  RAID10 -> RAID1 -> DUP
  RAID0 -> SINGLE

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
 fs/btrfs/extent-tree.c |   43 +++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 3e68e2b..87cd611 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3065,6 +3065,30 @@ u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags)
 	return flags;
 }
 
+/*
+ * Degenerate the alloc profile:
+ *   RAID10 -> RAID1 -> DUP
+ *   RAID0 -> SINGLE
+ *
+ * This is used when there is no enough disk space to do chunk allocation.
+ * After degenerating the profile, we will try to allocate new chunks again.
+ */
+static u64 btrfs_degenerate_alloc_profile(u64 flags)
+{
+	if (flags & BTRFS_BLOCK_GROUP_RAID10) {
+		flags &= ~BTRFS_BLOCK_GROUP_RAID10;
+		flags |= BTRFS_BLOCK_GROUP_RAID1;
+	} else if (flags & BTRFS_BLOCK_GROUP_RAID1) {
+		flags &= ~BTRFS_BLOCK_GROUP_RAID1;
+		flags |= BTRFS_BLOCK_GROUP_DUP;
+	} else if (flags & BTRFS_BLOCK_GROUP_RAID0) {
+		flags &= ~BTRFS_BLOCK_GROUP_RAID0;
+	} else
+		flags = ULLONG_MAX;
+
+	return flags;
+}
+
 static u64 get_alloc_profile(struct btrfs_root *root, u64 flags)
 {
 	if (flags & BTRFS_BLOCK_GROUP_DATA)
@@ -3356,8 +3380,23 @@ again:
 	}
 
 	ret = btrfs_alloc_chunk(trans, extent_root, flags);
-	if (ret < 0 && ret != -ENOSPC)
-		goto out;
+	if (ret < 0) {
+		if (ret != -ENOSPC)
+			goto out;
+
+		/*
+		 * Degenerate the alloc profile:
+		 *   RAID10 -> RAID1 -> DUP
+		 *   RAID0 -> SINGLE
+		 * then we will try to allocate new chunks again. By this way,
+		 * we can utilize the whole disk spacem and make the behaviour
+		 * of the chunk allocation correspond with the space reservation
+		 * and the free space allocation.
+		 */
+		flags = btrfs_degenerate_alloc_profile(flags);
+		if (flags != ULLONG_MAX)
+			goto again;
+	}
 
 	spin_lock(&space_info->lock);
 	if (ret)
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC][PATCH 1/2] Btrfs: try to allocate new chunks with degenerated profile
  2012-01-17 10:02 [RFC][PATCH 1/2] Btrfs: try to allocate new chunks with degenerated profile Miao Xie
@ 2012-01-17 20:58 ` Chris Mason
  2012-01-18 10:12   ` Jan Schmidt
  2012-01-18 10:14 ` Arne Jansen
  2012-01-18 12:34 ` David Sterba
  2 siblings, 1 reply; 9+ messages in thread
From: Chris Mason @ 2012-01-17 20:58 UTC (permalink / raw)
  To: Miao Xie; +Cc: Linux Btrfs

These two didn't make my first pull request just because I wanted to get
something out the door.  I'll definitely have them in the next pull.

-chris


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PATCH 1/2] Btrfs: try to allocate new chunks with degenerated profile
  2012-01-17 20:58 ` Chris Mason
@ 2012-01-18 10:12   ` Jan Schmidt
  2012-01-18 12:41     ` Hugo Mills
  2012-01-19  5:58     ` Miao Xie
  0 siblings, 2 replies; 9+ messages in thread
From: Jan Schmidt @ 2012-01-18 10:12 UTC (permalink / raw)
  To: Chris Mason, Miao Xie, Linux Btrfs

On 17.01.2012 21:58, Chris Mason wrote:
> These two didn't make my first pull request just because I wanted to get
> something out the door.  I'll definitely have them in the next pull.

Please, don't do that! You can't just degenerate to DUP when RAID1 is
out of space, that's entirely different.

It's debatable whether degeneration from RAID0 to single is acceptable,
but that again has different characteristics.

ENOSPC is the best choice for both in my opinon.

-Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PATCH 1/2] Btrfs: try to allocate new chunks with degenerated profile
  2012-01-17 10:02 [RFC][PATCH 1/2] Btrfs: try to allocate new chunks with degenerated profile Miao Xie
  2012-01-17 20:58 ` Chris Mason
@ 2012-01-18 10:14 ` Arne Jansen
  2012-01-18 12:34 ` David Sterba
  2 siblings, 0 replies; 9+ messages in thread
From: Arne Jansen @ 2012-01-18 10:14 UTC (permalink / raw)
  To: miaox; +Cc: Linux Btrfs

On 17.01.2012 11:02, Miao Xie wrote:
> If there is no free space, the free space allocator will try to get space from
> the block group with the degenerated profile. For example, if there is no free
> space in the RAID1 block groups, the allocator will try to allocate space from
> the DUP block groups. And besides that, the space reservation has the similar
> behaviour: if there is no enough space in the space cache to reserve, it will
> reserve the space according to the disk space, and it just take mirror storage
> into account, no RAID0, RAID1, or RAID10.
> 
> So we'd better make the behaviour of chunk allocation correspond with space
> reservation and free space allocation, if there is no enough disk space to
> allocate RAID(RAID0, RAID1, RAID10) chunks, we degenerate the profile and try
> to allocate chunks again. Otherwise, enospc will happen though we reserve
> the space successfully and BUG_ON() will be triggered.
> 
> Degenerating rule:
>   RAID10 -> RAID1 -> DUP
>   RAID0 -> SINGLE
> 

Instead of changing the profile, wouldn't it be easier to just allow
RAID10 go down to 2 disks and RAID0 to 1? That would make things easier
in many places.
What I'm strongly opposed to is changing a RAID1 to DUP, as this loses
the protection against a single disk failure.

-Arne

> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
> ---
>  fs/btrfs/extent-tree.c |   43 +++++++++++++++++++++++++++++++++++++++++--
>  1 files changed, 41 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 3e68e2b..87cd611 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3065,6 +3065,30 @@ u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags)
>  	return flags;
>  }
>  
> +/*
> + * Degenerate the alloc profile:
> + *   RAID10 -> RAID1 -> DUP
> + *   RAID0 -> SINGLE
> + *
> + * This is used when there is no enough disk space to do chunk allocation.
> + * After degenerating the profile, we will try to allocate new chunks again.
> + */
> +static u64 btrfs_degenerate_alloc_profile(u64 flags)
> +{
> +	if (flags & BTRFS_BLOCK_GROUP_RAID10) {
> +		flags &= ~BTRFS_BLOCK_GROUP_RAID10;
> +		flags |= BTRFS_BLOCK_GROUP_RAID1;
> +	} else if (flags & BTRFS_BLOCK_GROUP_RAID1) {
> +		flags &= ~BTRFS_BLOCK_GROUP_RAID1;
> +		flags |= BTRFS_BLOCK_GROUP_DUP;
> +	} else if (flags & BTRFS_BLOCK_GROUP_RAID0) {
> +		flags &= ~BTRFS_BLOCK_GROUP_RAID0;
> +	} else
> +		flags = ULLONG_MAX;
> +
> +	return flags;
> +}
> +
>  static u64 get_alloc_profile(struct btrfs_root *root, u64 flags)
>  {
>  	if (flags & BTRFS_BLOCK_GROUP_DATA)
> @@ -3356,8 +3380,23 @@ again:
>  	}
>  
>  	ret = btrfs_alloc_chunk(trans, extent_root, flags);
> -	if (ret < 0 && ret != -ENOSPC)
> -		goto out;
> +	if (ret < 0) {
> +		if (ret != -ENOSPC)
> +			goto out;
> +
> +		/*
> +		 * Degenerate the alloc profile:
> +		 *   RAID10 -> RAID1 -> DUP
> +		 *   RAID0 -> SINGLE
> +		 * then we will try to allocate new chunks again. By this way,
> +		 * we can utilize the whole disk spacem and make the behaviour
> +		 * of the chunk allocation correspond with the space reservation
> +		 * and the free space allocation.
> +		 */
> +		flags = btrfs_degenerate_alloc_profile(flags);
> +		if (flags != ULLONG_MAX)
> +			goto again;
> +	}
>  
>  	spin_lock(&space_info->lock);
>  	if (ret)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PATCH 1/2] Btrfs: try to allocate new chunks with degenerated profile
  2012-01-17 10:02 [RFC][PATCH 1/2] Btrfs: try to allocate new chunks with degenerated profile Miao Xie
  2012-01-17 20:58 ` Chris Mason
  2012-01-18 10:14 ` Arne Jansen
@ 2012-01-18 12:34 ` David Sterba
  2 siblings, 0 replies; 9+ messages in thread
From: David Sterba @ 2012-01-18 12:34 UTC (permalink / raw)
  To: Miao Xie; +Cc: Linux Btrfs

On Tue, Jan 17, 2012 at 06:02:32PM +0800, Miao Xie wrote:
> So we'd better make the behaviour of chunk allocation correspond with space
> reservation and free space allocation, if there is no enough disk space to
> allocate RAID(RAID0, RAID1, RAID10) chunks, we degenerate the profile and try
> to allocate chunks again. Otherwise, enospc will happen though we reserve
> the space successfully and BUG_ON() will be triggered.
> 
> Degenerating rule:
>   RAID10 -> RAID1 -> DUP
>   RAID0 -> SINGLE

I think that silently changing RAID level under unpredictable conditions
could be a very unpleasant suprise to the administrator. Even worse when
it requires manual intervention to undo the change and without any
possibility to prevent it to happen again.

In case ENOSPC happens early due to overbooking (like untarring lots of
files as reported several times), will probably make it happen sooner
than when "there is absolutely no free space available". We will lose
automatic raid1-repair and scrub repair capabilities, that's not good.


david

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PATCH 1/2] Btrfs: try to allocate new chunks with degenerated profile
  2012-01-18 10:12   ` Jan Schmidt
@ 2012-01-18 12:41     ` Hugo Mills
  2012-01-18 13:42       ` Roman Kapusta
  2012-01-19  5:58     ` Miao Xie
  1 sibling, 1 reply; 9+ messages in thread
From: Hugo Mills @ 2012-01-18 12:41 UTC (permalink / raw)
  To: Jan Schmidt; +Cc: Chris Mason, Miao Xie, Linux Btrfs

[-- Attachment #1: Type: text/plain, Size: 1386 bytes --]

On Wed, Jan 18, 2012 at 11:12:20AM +0100, Jan Schmidt wrote:
> On 17.01.2012 21:58, Chris Mason wrote:
> > These two didn't make my first pull request just because I wanted to get
> > something out the door.  I'll definitely have them in the next pull.
> 
> Please, don't do that! You can't just degenerate to DUP when RAID1 is
> out of space, that's entirely different.

   Agreed. This isn't a good idea.

> It's debatable whether degeneration from RAID0 to single is acceptable,
> but that again has different characteristics.

   RAID-0 to single and RAID-10 to RAID-1 are less controversial, I
think, although we need to be clear about what the performance
guarantees of the striping are (and how they would be affected by the
move to fewer disks). Given that we already degrade the striping of
RAID-0 down to "as many devices as we can fit right now", we're not
really providing much in the way of performance guarantees.

   (I'd definitely support some way of defining a fixed-width stripe,
but that's a whole separate question that we shouldn't get into now).

> ENOSPC is the best choice for both in my opinon.
> 
> -Jan

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
      --- What's a Nazgûl like you doing in a place like this? ---      

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PATCH 1/2] Btrfs: try to allocate new chunks with degenerated profile
  2012-01-18 12:41     ` Hugo Mills
@ 2012-01-18 13:42       ` Roman Kapusta
  0 siblings, 0 replies; 9+ messages in thread
From: Roman Kapusta @ 2012-01-18 13:42 UTC (permalink / raw)
  To: Hugo Mills, Jan Schmidt, Chris Mason, Miao Xie, Linux Btrfs

On Wed, Jan 18, 2012 at 13:41, Hugo Mills <hugo@carfax.org.uk> wrote:
> On Wed, Jan 18, 2012 at 11:12:20AM +0100, Jan Schmidt wrote:
>> On 17.01.2012 21:58, Chris Mason wrote:
>> > These two didn't make my first pull request just because I wanted =
to get
>> > something out the door. =C2=A0I'll definitely have them in the nex=
t pull.
>>
>> Please, don't do that! You can't just degenerate to DUP when RAID1 i=
s
>> out of space, that's entirely different.
>
> =C2=A0 Agreed. This isn't a good idea.
>
>> It's debatable whether degeneration from RAID0 to single is acceptab=
le,
>> but that again has different characteristics.
>
> =C2=A0 RAID-0 to single and RAID-10 to RAID-1 are less controversial,=
 I
> think, although we need to be clear about what the performance
> guarantees of the striping are (and how they would be affected by the
> move to fewer disks). Given that we already degrade the striping of
> RAID-0 down to "as many devices as we can fit right now", we're not
> really providing much in the way of performance guarantees.

Definitely this degeneration is good idea, but it should be only
optional and not default behavior of filesystem, default should be
ENOSPC.

>
> =C2=A0 (I'd definitely support some way of defining a fixed-width str=
ipe,
> but that's a whole separate question that we shouldn't get into now).
>
>> ENOSPC is the best choice for both in my opinon.
>>
>> -Jan
>
> --
> =3D=3D=3D Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.=
org.uk =3D=3D=3D
> =C2=A0PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.=
org.uk
> =C2=A0 =C2=A0 =C2=A0--- What's a Nazg=C3=BBl like you doing in a plac=
e like this? ---
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PATCH 1/2] Btrfs: try to allocate new chunks with degenerated profile
  2012-01-18 10:12   ` Jan Schmidt
  2012-01-18 12:41     ` Hugo Mills
@ 2012-01-19  5:58     ` Miao Xie
  2012-01-20 10:36       ` Jan Schmidt
  1 sibling, 1 reply; 9+ messages in thread
From: Miao Xie @ 2012-01-19  5:58 UTC (permalink / raw)
  To: Jan Schmidt; +Cc: Chris Mason, Linux Btrfs

On 	wed, 18 Jan 2012 11:12:20 +0100, Jan Schmidt wrote:
> On 17.01.2012 21:58, Chris Mason wrote:
>> These two didn't make my first pull request just because I wanted to get
>> something out the door.  I'll definitely have them in the next pull.
> 
> Please, don't do that! You can't just degenerate to DUP when RAID1 is
> out of space, that's entirely different.
> 
> It's debatable whether degeneration from RAID0 to single is acceptable,
> but that again has different characteristics.
> 
> ENOSPC is the best choice for both in my opinon.

I understand what you said, but in fact, the free space allocator can degenerate
the profile if it doesn't find enough free space. This patch just follows the
rule which exists in the current code.

Maybe adding a new mount option is another good option.

Thanks
Miao

> 
> -Jan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PATCH 1/2] Btrfs: try to allocate new chunks with degenerated profile
  2012-01-19  5:58     ` Miao Xie
@ 2012-01-20 10:36       ` Jan Schmidt
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Schmidt @ 2012-01-20 10:36 UTC (permalink / raw)
  To: miaox; +Cc: Chris Mason, Linux Btrfs

On 19.01.2012 06:58, Miao Xie wrote:
> On 	wed, 18 Jan 2012 11:12:20 +0100, Jan Schmidt wrote:
>> On 17.01.2012 21:58, Chris Mason wrote:
>>> These two didn't make my first pull request just because I wanted to get
>>> something out the door.  I'll definitely have them in the next pull.
>>
>> Please, don't do that! You can't just degenerate to DUP when RAID1 is
>> out of space, that's entirely different.
>>
>> It's debatable whether degeneration from RAID0 to single is acceptable,
>> but that again has different characteristics.
>>
>> ENOSPC is the best choice for both in my opinon.
> 
> I understand what you said, but in fact, the free space allocator can degenerate
> the profile if it doesn't find enough free space. This patch just follows the
> rule which exists in the current code.

I'm not sure what you mean with "free space allocation". In my
understanding, new "free" space is made available by allocating a new
chunk, and that's what you're suggesting to change. What am I missing here?

Calculation of free space like for df output is known to be at least
unintuitive (I'd say wrong). We'd better fix that.

Space reservation may be wrongly assuming it can use the whole disk. If
that's the case, we must fix it.

> Maybe adding a new mount option is another good option.

I don't think so. If you want RAID-1, you get RAID-1, i.e. every stripe
on two disks. If you're okay with DUP, use DUP. But even with a mount
option, degeneration would still go silent and you'd never know which
part of your data would survive a single disk failure.

Think of degenration of your metadata profile to DUP: Suddenly, your
most recent extent tree wouldn't survive a single disk failure.

In my opinion, we're responsible not to offer any dangerous (mount)
options that can make users lose data eventually. Even if we can point
them to the documentation (which they didn't read) that explains the
risks of that option.

-Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-01-20 10:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-17 10:02 [RFC][PATCH 1/2] Btrfs: try to allocate new chunks with degenerated profile Miao Xie
2012-01-17 20:58 ` Chris Mason
2012-01-18 10:12   ` Jan Schmidt
2012-01-18 12:41     ` Hugo Mills
2012-01-18 13:42       ` Roman Kapusta
2012-01-19  5:58     ` Miao Xie
2012-01-20 10:36       ` Jan Schmidt
2012-01-18 10:14 ` Arne Jansen
2012-01-18 12:34 ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).