Re: [PATCH V2 4/4] btrfs: cleanup barrier_all_devices() to check dev stat flush error

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Anand Jain <anand.jain@oracle.com>
To: dsterba@suse.cz, linux-btrfs@vger.kernel.org, quwenruo@cn.fujitsu.com
Subject: Re: [PATCH V2 4/4] btrfs: cleanup barrier_all_devices() to check dev stat flush error
Date: Wed, 29 Mar 2017 18:00:55 +0800	[thread overview]
Message-ID: <cea41654-19b4-d96f-bb44-0f8a63cf36dc@oracle.com> (raw)
In-Reply-To: <20170328161905.GB4781@twin.jikos.cz>



On 03/29/2017 12:19 AM, David Sterba wrote:
> On Tue, Mar 14, 2017 at 04:26:11PM +0800, Anand Jain wrote:
>> The objective of this patch is to cleanup barrier_all_devices()
>> so that the error checking is in a separate loop independent of
>> of the loop which submits and waits on the device flush requests.
>
> I think that getting completely rid of the ENOMEM as suggested under
> patch 2, the split would not be necessary.
>
> The send failures will be gone so we can simply count failures from the
> waiting write_dev_flush(..., 1). There any returned error also implies
> the device stat increase for FLUSH_ERRS. Then barrier_all_devices will
> be a bit simpler.

  However we need per device flush error, so we can deduce if
  the volume is degrade-mountable [1]
  (and so the temporary shelter for the per-device error was necessary).

  [1] btrfs: Introduce a function to check if all chunks a OK for 
degraded rw mount


>> By doing this it helps to further develop patches which would tune
>> the error-actions as needed.
>>
>> Here functions such as btrfs_dev_stats_dirty() couldn't be used
>> because it doesn't monitor the flush errors BTRFS_DEV_STAT_FLUSH_ERRS.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>> v2: Address Qu review comments viz..
>>      Add meaningful names, like cp_list (for checkpoint_list head).
>>      (And actually it does not need a new struct type just to hold
>>       the head pointer, list node is already named as device_checkpoint).
>>      Check return value of add_device_checkpoint()
>>      Check if the device is already added at add_device_checkpoint()
>>      Rename fini_devices_checkpoint() to rel_devices_checkpoint()
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 5719e036048b..d0c401884643 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3566,6 +3566,86 @@ static int write_dev_flush(struct btrfs_device *device, int wait)
>>  	return 0;
>>  }
>>
>> +struct device_checkpoint {
>> +	struct list_head cp_node;
>> +	struct btrfs_device *device;
>> +	int stat_value_checkpoint;
>> +};
>
> I find this approach particularly bad for several reasons.
>
> You need to store just one int per device but introduce a bloated
> temporary structure. If anything, the int could be embeded in
> btrfs_device, set and checked at the right points. Instead, you add
> memory allocations to a critical context and more potential failure
> paths.

  Right. As of now we are using dev_stats_ccnt to flag if the dev_stat
  contents have changed in general, however its not specific to one of
  the 5 error in which flush error is one among them. Here we are
  tracking only the flush errors occurred in this-commit do-barrier
  request.

  I think one other idea is to track flush error similar to
  dev_stats_ccnt but limited only to flush error. However in the long
  term we may end up similarly tracking the other errors too,
  and I was bit concerned about that, so limited the changes to local.
  But now I think its good to try if you are ok, Unless there is any
  other better way ?

  Thanks for pointing out other misses as below, the above new plan
  will fix them as well.

Thanks, Anand


>> +static int add_device_checkpoint(struct btrfs_device *device,
>> +				struct list_head *cp_list)
>> +{
> ...
>> +	cdev = kzalloc(sizeof(struct device_checkpoint), GFP_KERNEL);
> ...
>> +}
>
> The memory allocations are GFP_KERNEL, this could lead to strange
> lockups if the allocator tries to free memory and asks the filesystem(s)
> to flush their data.
>
> Traversing the structures leads to quadratic complexity in
> check_barrier_error(), where the checkpoint list is traversed for
> each iteration of device list.


>> @@ -3574,8 +3654,8 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
>>  {
>>  	struct list_head *head;
>>  	struct btrfs_device *dev;
>> -	int dropouts = 0;
>>  	int ret;
>> +	static LIST_HEAD(cp_list);
>         ^^^^^^
>
> What is this supposed to mean? What if several filesystems end up in
> barrier_all_devices modifying the list?

next prev parent reply	other threads:[~2017-03-29  9:56 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-13  7:42 [PATCH 0/4] cleanup barrier_all_devices() Anand Jain
2017-03-13  7:42 ` [PATCH 1/4] btrfs: REQ_PREFLUSH does not use btrfs_end_bio() completion callback Anand Jain
2017-03-28 15:19   ` David Sterba
2017-03-29 10:00     ` Anand Jain
2017-03-30 10:57       ` Anand Jain
2017-03-13  7:42 ` [PATCH 2/4] btrfs: Communicate back ENOMEM when it occurs Anand Jain
2017-03-14  8:49   ` Qu Wenruo
2017-03-28 15:38   ` David Sterba
2017-03-29 10:00     ` Anand Jain
2017-03-13  7:42 ` [PATCH 3/4] btrfs: cleanup barrier_all_devices() unify dev error count Anand Jain
2017-03-14  8:53   ` Qu Wenruo
2017-03-13  7:42 ` [PATCH 4/4] btrfs: cleanup barrier_all_devices() to check dev stat flush error Anand Jain
2017-03-13  9:05   ` Qu Wenruo
2017-03-13 16:21     ` Anand Jain
2017-03-14  0:28       ` Qu Wenruo
2017-03-14  3:36         ` Anand Jain
2017-03-14  8:26   ` [PATCH V2 " Anand Jain
2017-03-14  8:47     ` Qu Wenruo
2017-03-28 16:19     ` David Sterba
2017-03-29 10:00       ` Anand Jain [this message]
2017-03-31 11:36   ` [PATCH 4/4 V2] " Anand Jain
2017-04-05  4:07   ` [PATCH 4/4 V3] " Anand Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cea41654-19b4-d96f-bb44-0f8a63cf36dc@oracle.com \
    --to=anand.jain@oracle.com \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).