* [PATCH v4 1/3] btrfs: drop uuid_mutex in btrfs_free_extra_devids()
2018-08-10 5:53 [PATCH v2 0/3] Misc volume patch set part2 Anand Jain
@ 2018-08-10 5:53 ` Anand Jain
2018-08-10 5:53 ` [PATCH 1/2] btrfs: assert for num_devices below 0 Anand Jain
` (2 subsequent siblings)
3 siblings, 0 replies; 10+ messages in thread
From: Anand Jain @ 2018-08-10 5:53 UTC (permalink / raw)
To: linux-btrfs
btrfs_free_extra_devids() is called only in the mount context which
traverses through the fs_devices::devices and frees the orphan devices
devices in the given %fs_devices if any. As the search for the orphan
device is limited to fs_devices::devices so we don't need the global
uuid_mutex.
There can't be any mount-point based ioctl threads in this context as
the mount thread is not yet returned. But there can be the btrfs-control
based scan ioctls thread which calls device_list_add().
Here in the mount thread the fs_devices::opened is incremented way before
btrfs_free_extra_devids() is called and in the scan context the fs_devices
which are already opened neither be freed or alloc-able at
device_list_add().
But lets say you change the device-path and call the scan again, then scan
would update the new device path and this operation could race against the
btrfs_free_extra_devids() thread, which might be in the process of
free-ing the same device. So synchronize it by using the
device_list_mutex.
This scenario is a very corner case, and practically the scan and mount
are anyway serialized by the usage so unless the race is instrumented its
very difficult to achieve.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
---
v3->v4: As we traverse through the seed device, fs_device gets updated with
the child seed fs_devices, so make sure we use the same fs_devices
pointer for the mutex_unlock as used for the mutex_lock.
v2->v3: Update change log.
(Currently device_list_add() is very lean on its device_list_mutex usage,
a cleanup and fix is wip. Given the practicality of the above race
condition this patch is good to merge).
v1->v2: replace uuid_mutex with device_list_mutex instead of delete.
change log updated.
fs/btrfs/volumes.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f4405e430da6..f277df935e7c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -934,8 +934,9 @@ void btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices, int step)
{
struct btrfs_device *device, *next;
struct btrfs_device *latest_dev = NULL;
+ struct btrfs_fs_devices *parent_fs_devices = fs_devices;
- mutex_lock(&uuid_mutex);
+ mutex_lock(&parent_fs_devices->device_list_mutex);
again:
/* This is the initialized path, it is safe to release the devices. */
list_for_each_entry_safe(device, next, &fs_devices->devices, dev_list) {
@@ -989,8 +990,7 @@ void btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices, int step)
}
fs_devices->latest_bdev = latest_dev->bdev;
-
- mutex_unlock(&uuid_mutex);
+ mutex_unlock(&parent_fs_devices->device_list_mutex);
}
static void free_device_rcu(struct rcu_head *head)
--
2.7.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 1/2] btrfs: assert for num_devices below 0
2018-08-10 5:53 [PATCH v2 0/3] Misc volume patch set part2 Anand Jain
2018-08-10 5:53 ` [PATCH v4 1/3] btrfs: drop uuid_mutex in btrfs_free_extra_devids() Anand Jain
@ 2018-08-10 5:53 ` Anand Jain
2018-08-10 10:46 ` David Sterba
2018-08-15 12:11 ` David Sterba
2018-08-10 5:53 ` [PATCH v5 2/2] btrfs: add helper btrfs_num_devices() to deduce num_devices Anand Jain
2018-08-15 12:20 ` [PATCH v2 0/3] Misc volume patch set part2 David Sterba
3 siblings, 2 replies; 10+ messages in thread
From: Anand Jain @ 2018-08-10 5:53 UTC (permalink / raw)
To: linux-btrfs
In preparation to add helper function to deduce the num_devices with
replace running, use assert instead of bug_on and warn_on.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/volumes.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b6d9b6a6fba7..0062615a79be 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1877,7 +1877,7 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
num_devices = fs_devices->num_devices;
btrfs_dev_replace_read_lock(&fs_info->dev_replace);
if (btrfs_dev_replace_is_ongoing(&fs_info->dev_replace)) {
- WARN_ON(num_devices < 1);
+ ASSERT(num_devices > 0);
num_devices--;
}
btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
@@ -3752,7 +3752,7 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
num_devices = fs_info->fs_devices->num_devices;
btrfs_dev_replace_read_lock(&fs_info->dev_replace);
if (btrfs_dev_replace_is_ongoing(&fs_info->dev_replace)) {
- BUG_ON(num_devices < 1);
+ ASSERT(num_devices > 0);
num_devices--;
}
btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
--
2.7.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] btrfs: assert for num_devices below 0
2018-08-10 5:53 ` [PATCH 1/2] btrfs: assert for num_devices below 0 Anand Jain
@ 2018-08-10 10:46 ` David Sterba
2018-08-15 12:11 ` David Sterba
1 sibling, 0 replies; 10+ messages in thread
From: David Sterba @ 2018-08-10 10:46 UTC (permalink / raw)
To: Anand Jain; +Cc: linux-btrfs
On Fri, Aug 10, 2018 at 01:53:20PM +0800, Anand Jain wrote:
> In preparation to add helper function to deduce the num_devices with
> replace running, use assert instead of bug_on and warn_on.
>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
Ok for the updated condition as it's going to be used in the new helper.
Reviewed-by: David Sterba <dsterba@suse.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] btrfs: assert for num_devices below 0
2018-08-10 5:53 ` [PATCH 1/2] btrfs: assert for num_devices below 0 Anand Jain
2018-08-10 10:46 ` David Sterba
@ 2018-08-15 12:11 ` David Sterba
2018-08-16 2:09 ` Anand Jain
1 sibling, 1 reply; 10+ messages in thread
From: David Sterba @ 2018-08-15 12:11 UTC (permalink / raw)
To: Anand Jain; +Cc: linux-btrfs
On Fri, Aug 10, 2018 at 01:53:20PM +0800, Anand Jain wrote:
> In preparation to add helper function to deduce the num_devices with
> replace running, use assert instead of bug_on and warn_on.
>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
> fs/btrfs/volumes.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index b6d9b6a6fba7..0062615a79be 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1877,7 +1877,7 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
> num_devices = fs_devices->num_devices;
> btrfs_dev_replace_read_lock(&fs_info->dev_replace);
> if (btrfs_dev_replace_is_ongoing(&fs_info->dev_replace)) {
> - WARN_ON(num_devices < 1);
> + ASSERT(num_devices > 0);
> num_devices--;
I was about to merge the patch but this gave me another opportunity to
look at the code: the assertion should check for > 1. The value 1 of
num_devices is sligthly wrong here and would lead to 0 returned from the
function.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] btrfs: assert for num_devices below 0
2018-08-15 12:11 ` David Sterba
@ 2018-08-16 2:09 ` Anand Jain
0 siblings, 0 replies; 10+ messages in thread
From: Anand Jain @ 2018-08-16 2:09 UTC (permalink / raw)
To: dsterba, linux-btrfs
On 08/15/2018 08:11 PM, David Sterba wrote:
> On Fri, Aug 10, 2018 at 01:53:20PM +0800, Anand Jain wrote:
>> In preparation to add helper function to deduce the num_devices with
>> replace running, use assert instead of bug_on and warn_on.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>> fs/btrfs/volumes.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index b6d9b6a6fba7..0062615a79be 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -1877,7 +1877,7 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
>> num_devices = fs_devices->num_devices;
>> btrfs_dev_replace_read_lock(&fs_info->dev_replace);
>> if (btrfs_dev_replace_is_ongoing(&fs_info->dev_replace)) {
>> - WARN_ON(num_devices < 1);
>> + ASSERT(num_devices > 0);
>> num_devices--;
>
> I was about to merge the patch but this gave me another opportunity to
> look at the code: the assertion should check for > 1. The value 1 of
> num_devices is sligthly wrong here and would lead to 0 returned from the
> function.
Agree its slightly wrong but I kept it as it is to match with the
original code. I mentioned about it in the cover letter.
Thanks, Anand
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v5 2/2] btrfs: add helper btrfs_num_devices() to deduce num_devices
2018-08-10 5:53 [PATCH v2 0/3] Misc volume patch set part2 Anand Jain
2018-08-10 5:53 ` [PATCH v4 1/3] btrfs: drop uuid_mutex in btrfs_free_extra_devids() Anand Jain
2018-08-10 5:53 ` [PATCH 1/2] btrfs: assert for num_devices below 0 Anand Jain
@ 2018-08-10 5:53 ` Anand Jain
2018-08-10 10:45 ` David Sterba
2018-08-15 12:20 ` [PATCH v2 0/3] Misc volume patch set part2 David Sterba
3 siblings, 1 reply; 10+ messages in thread
From: Anand Jain @ 2018-08-10 5:53 UTC (permalink / raw)
To: linux-btrfs
When the replace is running the fs_devices::num_devices also includes
the replace device, however in some operations like device delete and
balance it needs the actual num_devices without the repalce devices, so
now the function btrfs_num_devices() just provides that.
And here is a scenario how balance and repalce items could co-exist.
Consider balance is started and paused, now start the replace
followed by a unmount or power-recycle of the system. During following
mount, the open_ctree() first restarts the balance so it must check for
the replace device otherwise our num_devices calculation will be wrong.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v4->v5: uses assert.
v3->v4: add comment and drop the inline (sorry missed it before)
v2->v3: update changelog with not so obvious balance and repalce
co-existance secnario
v1->v2: add comments
fs/btrfs/volumes.c | 32 ++++++++++++++++++--------------
1 file changed, 18 insertions(+), 14 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 0062615a79be..630f9ec158d0 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1863,6 +1863,21 @@ void btrfs_assign_next_active_device(struct btrfs_device *device,
fs_info->fs_devices->latest_bdev = next_device->bdev;
}
+/* Returns btrfs_fs_devices::num_devices excluding replace device if any */
+static u64 btrfs_num_devices(struct btrfs_fs_info *fs_info)
+{
+ u64 num_devices = fs_info->fs_devices->num_devices;
+
+ btrfs_dev_replace_read_lock(&fs_info->dev_replace);
+ if (btrfs_dev_replace_is_ongoing(&fs_info->dev_replace)) {
+ ASSERT(num_devices > 0);
+ num_devices--;
+ }
+ btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
+
+ return num_devices;
+}
+
int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
u64 devid)
{
@@ -1874,13 +1889,7 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
mutex_lock(&uuid_mutex);
- num_devices = fs_devices->num_devices;
- btrfs_dev_replace_read_lock(&fs_info->dev_replace);
- if (btrfs_dev_replace_is_ongoing(&fs_info->dev_replace)) {
- ASSERT(num_devices > 0);
- num_devices--;
- }
- btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
+ num_devices = btrfs_num_devices(fs_info);
ret = btrfs_check_raid_min_devices(fs_info, num_devices - 1);
if (ret)
@@ -3749,13 +3758,8 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
}
}
- num_devices = fs_info->fs_devices->num_devices;
- btrfs_dev_replace_read_lock(&fs_info->dev_replace);
- if (btrfs_dev_replace_is_ongoing(&fs_info->dev_replace)) {
- ASSERT(num_devices > 0);
- num_devices--;
- }
- btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
+ num_devices = btrfs_num_devices(fs_info);
+
allowed = BTRFS_AVAIL_ALLOC_BIT_SINGLE | BTRFS_BLOCK_GROUP_DUP;
if (num_devices > 1)
allowed |= (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1);
--
2.7.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v5 2/2] btrfs: add helper btrfs_num_devices() to deduce num_devices
2018-08-10 5:53 ` [PATCH v5 2/2] btrfs: add helper btrfs_num_devices() to deduce num_devices Anand Jain
@ 2018-08-10 10:45 ` David Sterba
0 siblings, 0 replies; 10+ messages in thread
From: David Sterba @ 2018-08-10 10:45 UTC (permalink / raw)
To: Anand Jain; +Cc: linux-btrfs
On Fri, Aug 10, 2018 at 01:53:21PM +0800, Anand Jain wrote:
> When the replace is running the fs_devices::num_devices also includes
> the replace device, however in some operations like device delete and
> balance it needs the actual num_devices without the repalce devices, so
> now the function btrfs_num_devices() just provides that.
>
> And here is a scenario how balance and repalce items could co-exist.
> Consider balance is started and paused, now start the replace
> followed by a unmount or power-recycle of the system. During following
> mount, the open_ctree() first restarts the balance so it must check for
> the replace device otherwise our num_devices calculation will be wrong.
>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
> v4->v5: uses assert.
> v3->v4: add comment and drop the inline (sorry missed it before)
> v2->v3: update changelog with not so obvious balance and repalce
> co-existance secnario
> v1->v2: add comments
>
> fs/btrfs/volumes.c | 32 ++++++++++++++++++--------------
> 1 file changed, 18 insertions(+), 14 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 0062615a79be..630f9ec158d0 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1863,6 +1863,21 @@ void btrfs_assign_next_active_device(struct btrfs_device *device,
> fs_info->fs_devices->latest_bdev = next_device->bdev;
> }
>
> +/* Returns btrfs_fs_devices::num_devices excluding replace device if any */
> +static u64 btrfs_num_devices(struct btrfs_fs_info *fs_info)
> +{
> + u64 num_devices = fs_info->fs_devices->num_devices;
> +
> + btrfs_dev_replace_read_lock(&fs_info->dev_replace);
> + if (btrfs_dev_replace_is_ongoing(&fs_info->dev_replace)) {
> + ASSERT(num_devices > 0);
> + num_devices--;
> + }
> + btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
I'll move the assert with the updated condition here so it covers also
the non dev-replace case. Otherwise ok.
> +
> + return num_devices;
> +}
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/3] Misc volume patch set part2
2018-08-10 5:53 [PATCH v2 0/3] Misc volume patch set part2 Anand Jain
` (2 preceding siblings ...)
2018-08-10 5:53 ` [PATCH v5 2/2] btrfs: add helper btrfs_num_devices() to deduce num_devices Anand Jain
@ 2018-08-15 12:20 ` David Sterba
2018-08-16 2:59 ` Anand Jain
3 siblings, 1 reply; 10+ messages in thread
From: David Sterba @ 2018-08-15 12:20 UTC (permalink / raw)
To: Anand Jain; +Cc: linux-btrfs
On Fri, Aug 10, 2018 at 01:53:18PM +0800, Anand Jain wrote:
> The patch set (2/3 and 3/3) adds helper function to deduce the num_device
> (which is the number of the devices when device is mounted, this excludes
> the seed device however includes the replacing target). We need to know
> the num_device that actually belongs to the FSID without the replacing
> target (if it exists) when we are balancing and or when we are trying to
> device delete. In both of these cases, it would be wrong to consider
> replacing target when reallocating the blockgroups.
>
> This patch as such does not change any of the logic which is already in
> the original code, but instead it would bring such a logic at two
> locations into a single place/function. But to do that as these two
> location uses either WARN_ON or BUG_ON to perform the safety catch so
> the first patch does the cleanup and uses ASSERT at both of these
> places. And now since these two sections of codes are identical, the 2nd
> patch replaces them into a helper function.
>
> Also though the check for num_device > 0 should actually be num_device >
> 1 OR instead, the check of num_device > 0 should have been after the
> num_device--,
I updated the patch following that logic now.
> since we have had already too many comments and added
> confusions, I shall leave that part to be fixed at sometime later. So
> that the cleanup here is a really a cleanup instead of chaning the
> logic in a rough scale.
>
> Last but not least, this patch has gone through a lot of comments on
> either to use BUG or return -EINVAL for accidental fall of num_device
> below 1, which I wasn't expecting so much of the concerns about
> the use of BUG_ON/-EINVAL as its trivial because fundamentally a FS
> can't be in the mounted state when num_device < 1.
>
>
> Anand Jain (3):
> btrfs: drop uuid_mutex in btrfs_free_extra_devids()
> btrfs: assert for num_devices below 0
> btrfs: add helper btrfs_num_devices() to deduce num_devices
Patches 2 and 3 are applied, with some adjustments. The first patch
needs another round of review as there were many changes regarding the
device locking and uuid_mutex. I need a break from this code as the
wounds haven't healed yet, but will add the patch to a topic branch to
next so I don't forget about it.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/3] Misc volume patch set part2
2018-08-15 12:20 ` [PATCH v2 0/3] Misc volume patch set part2 David Sterba
@ 2018-08-16 2:59 ` Anand Jain
0 siblings, 0 replies; 10+ messages in thread
From: Anand Jain @ 2018-08-16 2:59 UTC (permalink / raw)
To: dsterba, linux-btrfs
On 08/15/2018 08:20 PM, David Sterba wrote:
> On Fri, Aug 10, 2018 at 01:53:18PM +0800, Anand Jain wrote:
>> The patch set (2/3 and 3/3) adds helper function to deduce the num_device
>> (which is the number of the devices when device is mounted, this excludes
>> the seed device however includes the replacing target). We need to know
>> the num_device that actually belongs to the FSID without the replacing
>> target (if it exists) when we are balancing and or when we are trying to
>> device delete. In both of these cases, it would be wrong to consider
>> replacing target when reallocating the blockgroups.
>>
>> This patch as such does not change any of the logic which is already in
>> the original code, but instead it would bring such a logic at two
>> locations into a single place/function. But to do that as these two
>> location uses either WARN_ON or BUG_ON to perform the safety catch so
>> the first patch does the cleanup and uses ASSERT at both of these
>> places. And now since these two sections of codes are identical, the 2nd
>> patch replaces them into a helper function.
>>
>> Also though the check for num_device > 0 should actually be num_device >
>> 1 OR instead, the check of num_device > 0 should have been after the
>> num_device--,
>
> I updated the patch following that logic now.
Right thanks.
>> since we have had already too many comments and added
>> confusions, I shall leave that part to be fixed at sometime later. So
>> that the cleanup here is a really a cleanup instead of chaning the
>> logic in a rough scale.
>>
>> Last but not least, this patch has gone through a lot of comments on
>> either to use BUG or return -EINVAL for accidental fall of num_device
>> below 1, which I wasn't expecting so much of the concerns about
>> the use of BUG_ON/-EINVAL as its trivial because fundamentally a FS
>> can't be in the mounted state when num_device < 1.
>>
>>
>> Anand Jain (3):
>> btrfs: drop uuid_mutex in btrfs_free_extra_devids()
>> btrfs: assert for num_devices below 0
>> btrfs: add helper btrfs_num_devices() to deduce num_devices
>
> Patches 2 and 3 are applied, with some adjustments.
Adjustments at [1] are correct.
[1]
gitlab.com/kdave/btrfs-devel.git misc-next
Thanks, Anand.
> The first patch
> needs another round of review as there were many changes regarding the
> device locking and uuid_mutex. I need a break from this code as the
> wounds haven't healed yet, but will add the patch to a topic branch to
> next so I don't forget about it.
^ permalink raw reply [flat|nested] 10+ messages in thread