linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Anand Jain <anand.jain@oracle.com>
To: dsterba@suse.cz
Cc: linux-btrfs@vger.kernel.org, dsterba@suse.com, wqu@suse.com,
	hrx@bupt.moe, waxhead@dirtcellar.net
Subject: Re: [PATCH v2 0/3] raid1 balancing methods
Date: Tue, 22 Oct 2024 08:31:25 +0800	[thread overview]
Message-ID: <f8301af6-f10d-431b-8522-af6dc015cb45@oracle.com> (raw)
In-Reply-To: <20241021184253.GB24631@suse.cz>



On 22/10/24 02:42, David Sterba wrote:
> On Mon, Oct 21, 2024 at 11:36:10PM +0800, Anand Jain wrote:
>>> I think it's safe to start with the round-round robin policy, but the
>>> syntax is strange, why the [ ] are mandatory? Also please call it
>>> round-robin, or 'rr' for short.
>>
>> I'm fine with round-robin. The [ ] part is not mandatory; if the
>> min_contiguous_read value is not specified, it will default to a
>> predefined value.
>>
>>> The default of sector size is IMHO a wrong value, switching devices so
>>> often will drop the performance just because of the io request overhead.
>>
>>>   From what I rememer values around 200k were reasonable, so either 192k
>>> or 256k should be the default. We may also drop the configurable value
>>> at all and provide a few hard coded sizes like rr-256k, rr-512k, rr-1m,
>>> if not only to drop parsing of user strings.
>>
>> I'm okay with a default value of 256k. For the experimental feature,
>> we can keep it configurable, allowing the opportunity to experiment
>> with other values as well
> 
> Yeah, for experimenting it makes sense to make it flexible, no need to
> patch and reboot the kernel. For final we should settle on some
> reasonable values.
> 
>>>> 5. Tested FIO random read/write and defrag compression workloads with
>>>>      min_contiguous_read set to sector size, 192k, and 256k.
>>>>
>>>>      RAID1 balancing method rotation is better for multi-process workloads
>>>>      such as fio and also single-process workload such as defragmentation.
>>>>
>>>>        $ fio --filename=/btrfs/foo --size=5Gi --direct=1 --rw=randrw --bs=4k \
>>>>           --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 \
>>>>           --time_based --group_reporting --name=iops-test-job --eta-newline=1
>>>>
>>>>
>>>> |         |            |            | Read I/O count  |
>>>> |         | Read       | Write      | devid1 | devid2 |
>>>> |---------|------------|------------|--------|--------|
>>>> | pid     | 20.3MiB/s  | 20.5MiB/s  | 313895 | 313895 |
>>>> | rotation|            |            |        |        |
>>>> |     4096| 20.4MiB/s  | 20.5MiB/s  | 313895 | 313895 |
>>>> |   196608| 20.2MiB/s  | 20.2MiB/s  | 310152 | 310175 |
>>>> |   262144| 20.3MiB/s  | 20.4MiB/s  | 312180 | 312191 |
>>>> |  latency| 18.4MiB/s  | 18.4MiB/s  | 272980 | 291683 |
>>>> | devid:1 | 14.8MiB/s  | 14.9MiB/s  | 456376 | 0      |
>>>>
>>>>      rotation RAID1 balancing technique performs more than 2x better for
>>>>      single-process defrag.
>>>>
>>>>         $ time -p btrfs filesystem defrag -r -f -c /btrfs
>>>>
>>>>
>>>> |         | Time  | Read I/O Count  |
>>>> |         | Real  | devid1 | devid2 |
>>>> |---------|-------|--------|--------|
>>>> | pid     | 18.00s| 3800   | 0      |
>>>> | rotation|       |        |        |
>>>> |     4096|  8.95s| 1900   | 1901   |
>>>> |   196608|  8.50s| 1881   | 1919   |
>>>> |   262144|  8.80s| 1881   | 1919   |
>>>> | latency | 17.18s| 3800   | 0      |
>>>> | devid:2 | 17.48s| 0      | 3800   |
>>>>
>>>> Rotation keeps all devices active, and for now, the Rotation RAID1
>>>> balancing method is preferable as default. More workload testing is
>>>> needed while the code is EXPERIMENTAL.
>>>
>>> Yeah round-robin will be a good defalt, we only need to verify the chunk
>>> size and then do the switch in the next release.
>>>
>>
>> Yes..
>>
>>>> While Latency is better during the failing/unstable block layer transport.
>>>> As of now these two techniques, are needed to be further independently
>>>> tested with different worloads, and in the long term we should be merge
>>>> these technique to a unified heuristic.
>>>
>>> This sounds like he latency is good for a specific case and maybe a
>>> fallback if the device becomes faulty, but once the layer below becomes
>>> unstable we may need to skip reading from the device. This is also a
>>> different mode of operation than balancing reads.
>>>
>>
>> If the latency on the faulty path is so high that it shouldn't pick that
>> path at all, so it works. However, the round-robin balancing is unaware
>> of dynamic faults on the device path. IMO, a round-robin method that is
>> latency aware (with ~20% variance) would be better.
> 
> We should not mix the faulty device handling mode to the read balancing,
> at least for now. A back off algorithm that checks number of failed io
> requests should precede the balancing.
> 
>>>> Rotation keeps all devices active, and for now, the Rotation RAID1
>>>> balancing method should be the default. More workload testing is needed
>>>> while the code is EXPERIMENTAL.
>>>>
>>>> Latency is smarter with unstable block layer transport.
>>>>
>>>> Both techniques need independent testing across workloads, with the goal of
>>>> eventually merging them into a unified approach? for the long term.
>>>>
>>>> Devid is a hands-on approach, provides manual or user-space script control.
>>>>
>>>> These RAID1 balancing methods are tunable via the sysfs knob.
>>>> The mount -o option and btrfs properties are under consideration.
>>>
>>> To move forward with the feature I think the round robin and preferred
>>> device id can be merged. I'm not sure about the latency but if it's
>>> under experimental we can take it as is and tune later.
>>
>> I hope the experimental feature also means we can change the name of the
>> balancing method at any time. Once we have tested a fair combination of
>> block device types, we'll definitely need a method that can
>> automatically tune based on the device type, which will require adding
>> or dropping balancing methods accordingly.
> 
> Yes we can change the names. The automatic tuning would need some
> feedback that measures the load and tries to improve the throughput,
> this is where we got stuck last time. So for now let's do some
> starightforward policy that on average works better than the current pid
> policy. I hope that tha round-robin-256k can be a good default, but of
> course we need more data for that.

Sending v3 with rotation renamed to round-robin. Code review
appreciated; I'll wait a day.

Thanks, Anand

  reply	other threads:[~2024-10-22  0:31 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-11  2:49 [PATCH v2 0/3] raid1 balancing methods Anand Jain
2024-10-11  2:49 ` [PATCH v2 1/3] btrfs: introduce RAID1 round-robin read balancing Anand Jain
2024-10-11  2:49 ` [PATCH v2 2/3] btrfs: use the path with the lowest latency for RAID1 reads Anand Jain
2024-10-11  2:49 ` [PATCH v2 3/3] btrfs: add RAID1 preferred read device Anand Jain
2024-10-11  3:35 ` [PATCH v2 0/3] raid1 balancing methods Anand Jain
2024-10-11  4:59 ` Qu Wenruo
2024-10-11  6:04   ` Anand Jain
2024-10-21 14:05 ` David Sterba
2024-10-21 15:36   ` Anand Jain
2024-10-21 18:42     ` David Sterba
2024-10-22  0:31       ` Anand Jain [this message]
2024-10-21 14:32 ` waxhead
2024-10-21 15:44   ` Anand Jain
2024-10-22  7:07   ` Johannes Thumshirn
2024-10-24  4:39 ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f8301af6-f10d-431b-8522-af6dc015cb45@oracle.com \
    --to=anand.jain@oracle.com \
    --cc=dsterba@suse.com \
    --cc=dsterba@suse.cz \
    --cc=hrx@bupt.moe \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=waxhead@dirtcellar.net \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).