From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from aserp2130.oracle.com ([141.146.126.79]:41470 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752009AbeAaJ1L (ORCPT ); Wed, 31 Jan 2018 04:27:11 -0500 Subject: Re: [PATCH 2/2] btrfs: add read_mirror_policy parameter devid To: Nikolay Borisov , linux-btrfs@vger.kernel.org References: <20180130063020.14850-1-anand.jain@oracle.com> <20180130063020.14850-3-anand.jain@oracle.com> From: Anand Jain Message-ID: <7a21d5f0-b7f0-9da6-22b6-b45976d6ab40@oracle.com> Date: Wed, 31 Jan 2018 17:28:26 +0800 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 01/31/2018 04:38 PM, Nikolay Borisov wrote: > > > On 30.01.2018 08:30, Anand Jain wrote: >> Adds the mount option: >> mount -o read_mirror_policy= >> >> To set the devid of the device which should be used for read. That >> means all the normal reads will go to that particular device only. >> >> This also helps testing and gives a better control for the test >> scripts including mount context reads. > > Some code comments below. OTOH, does such policy really make sense, what > happens if the selected device fails, will the other mirror be retried? Everything as usual, read_mirror_policy=devid just lets the user to specify his read optimized disk, so that we don't depend on the pid to pick a stripe mirrored disk, and instead we would pick as suggested by the user, and if that disk fails then we go back to the other mirror which may not be the read optimized disk as we have no other choice. > If the answer to the previous question is positive then why do we really > care which device is going to be tried first? It matters. - If you are reading from both disks alternatively, then it duplicates the LUN cache on the storage. - Some disks are read-optimized and using that for reading and going back to the other disk only when this disk fails provides a better overall read performance. :: >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c >> index 39ba59832f38..478623e6e074 100644 >> --- a/fs/btrfs/volumes.c >> +++ b/fs/btrfs/volumes.c >> @@ -5270,6 +5270,16 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info, >> num = map->num_stripes; >> >> switch(fs_info->read_mirror_policy) { >> + case BTRFS_READ_MIRROR_BY_DEV: >> + optimal = first; >> + if (test_bit(BTRFS_DEV_STATE_READ_MIRROR, >> + &map->stripes[optimal].dev->dev_state)) >> + break; >> + if (test_bit(BTRFS_DEV_STATE_READ_MIRROR, >> + &map->stripes[++optimal].dev->dev_state)) >> + break; >> + optimal = first; > > you set optimal 2 times, the second one seems redundant. No actually. When both the disks containing the stripe does not have the BTRFS_DEV_STATE_READ_MIRROR, then I would just want to use first found stripe. > Alongside this > patch it makes sense to also send a patch to btrfs(5) man page > describing the mount option + description of each implemented allocation > policy. Yep. Will do. > Another thing which I don't see here is how you are handling the case > when you have more than 2 devices in the RAID1 case. As it stands > currently you assume there are two devices and first test device 0 and > then device 1 and completely ignore any other devices. Not really. That part is already handled by the extent mapping. As the number of stripe for raid1 is two, the extent mapping will manage put related two devices of this stripe. Thanks, Anand