From: Anand Jain <anand.jain@oracle.com>
To: dsterba@suse.cz, linux-btrfs@vger.kernel.org, quwenruo@cn.fujitsu.com
Subject: Re: [PATCH V2 4/4] btrfs: cleanup barrier_all_devices() to check dev stat flush error
Date: Wed, 29 Mar 2017 18:00:55 +0800 [thread overview]
Message-ID: <cea41654-19b4-d96f-bb44-0f8a63cf36dc@oracle.com> (raw)
In-Reply-To: <20170328161905.GB4781@twin.jikos.cz>
On 03/29/2017 12:19 AM, David Sterba wrote:
> On Tue, Mar 14, 2017 at 04:26:11PM +0800, Anand Jain wrote:
>> The objective of this patch is to cleanup barrier_all_devices()
>> so that the error checking is in a separate loop independent of
>> of the loop which submits and waits on the device flush requests.
>
> I think that getting completely rid of the ENOMEM as suggested under
> patch 2, the split would not be necessary.
>
> The send failures will be gone so we can simply count failures from the
> waiting write_dev_flush(..., 1). There any returned error also implies
> the device stat increase for FLUSH_ERRS. Then barrier_all_devices will
> be a bit simpler.
However we need per device flush error, so we can deduce if
the volume is degrade-mountable [1]
(and so the temporary shelter for the per-device error was necessary).
[1] btrfs: Introduce a function to check if all chunks a OK for
degraded rw mount
>> By doing this it helps to further develop patches which would tune
>> the error-actions as needed.
>>
>> Here functions such as btrfs_dev_stats_dirty() couldn't be used
>> because it doesn't monitor the flush errors BTRFS_DEV_STAT_FLUSH_ERRS.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>> v2: Address Qu review comments viz..
>> Add meaningful names, like cp_list (for checkpoint_list head).
>> (And actually it does not need a new struct type just to hold
>> the head pointer, list node is already named as device_checkpoint).
>> Check return value of add_device_checkpoint()
>> Check if the device is already added at add_device_checkpoint()
>> Rename fini_devices_checkpoint() to rel_devices_checkpoint()
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 5719e036048b..d0c401884643 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3566,6 +3566,86 @@ static int write_dev_flush(struct btrfs_device *device, int wait)
>> return 0;
>> }
>>
>> +struct device_checkpoint {
>> + struct list_head cp_node;
>> + struct btrfs_device *device;
>> + int stat_value_checkpoint;
>> +};
>
> I find this approach particularly bad for several reasons.
>
> You need to store just one int per device but introduce a bloated
> temporary structure. If anything, the int could be embeded in
> btrfs_device, set and checked at the right points. Instead, you add
> memory allocations to a critical context and more potential failure
> paths.
Right. As of now we are using dev_stats_ccnt to flag if the dev_stat
contents have changed in general, however its not specific to one of
the 5 error in which flush error is one among them. Here we are
tracking only the flush errors occurred in this-commit do-barrier
request.
I think one other idea is to track flush error similar to
dev_stats_ccnt but limited only to flush error. However in the long
term we may end up similarly tracking the other errors too,
and I was bit concerned about that, so limited the changes to local.
But now I think its good to try if you are ok, Unless there is any
other better way ?
Thanks for pointing out other misses as below, the above new plan
will fix them as well.
Thanks, Anand
>> +static int add_device_checkpoint(struct btrfs_device *device,
>> + struct list_head *cp_list)
>> +{
> ...
>> + cdev = kzalloc(sizeof(struct device_checkpoint), GFP_KERNEL);
> ...
>> +}
>
> The memory allocations are GFP_KERNEL, this could lead to strange
> lockups if the allocator tries to free memory and asks the filesystem(s)
> to flush their data.
>
> Traversing the structures leads to quadratic complexity in
> check_barrier_error(), where the checkpoint list is traversed for
> each iteration of device list.
>> @@ -3574,8 +3654,8 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
>> {
>> struct list_head *head;
>> struct btrfs_device *dev;
>> - int dropouts = 0;
>> int ret;
>> + static LIST_HEAD(cp_list);
> ^^^^^^
>
> What is this supposed to mean? What if several filesystems end up in
> barrier_all_devices modifying the list?
next prev parent reply other threads:[~2017-03-29 9:56 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-13 7:42 [PATCH 0/4] cleanup barrier_all_devices() Anand Jain
2017-03-13 7:42 ` [PATCH 1/4] btrfs: REQ_PREFLUSH does not use btrfs_end_bio() completion callback Anand Jain
2017-03-28 15:19 ` David Sterba
2017-03-29 10:00 ` Anand Jain
2017-03-30 10:57 ` Anand Jain
2017-03-13 7:42 ` [PATCH 2/4] btrfs: Communicate back ENOMEM when it occurs Anand Jain
2017-03-14 8:49 ` Qu Wenruo
2017-03-28 15:38 ` David Sterba
2017-03-29 10:00 ` Anand Jain
2017-03-13 7:42 ` [PATCH 3/4] btrfs: cleanup barrier_all_devices() unify dev error count Anand Jain
2017-03-14 8:53 ` Qu Wenruo
2017-03-13 7:42 ` [PATCH 4/4] btrfs: cleanup barrier_all_devices() to check dev stat flush error Anand Jain
2017-03-13 9:05 ` Qu Wenruo
2017-03-13 16:21 ` Anand Jain
2017-03-14 0:28 ` Qu Wenruo
2017-03-14 3:36 ` Anand Jain
2017-03-14 8:26 ` [PATCH V2 " Anand Jain
2017-03-14 8:47 ` Qu Wenruo
2017-03-28 16:19 ` David Sterba
2017-03-29 10:00 ` Anand Jain [this message]
2017-03-31 11:36 ` [PATCH 4/4 V2] " Anand Jain
2017-04-05 4:07 ` [PATCH 4/4 V3] " Anand Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cea41654-19b4-d96f-bb44-0f8a63cf36dc@oracle.com \
--to=anand.jain@oracle.com \
--cc=dsterba@suse.cz \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).