* [PATCH] btrfs: balance RAID1/RAID10 mirror selection
@ 2020-10-16 5:59 louis
2020-10-16 7:15 ` Nikolay Borisov
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: louis @ 2020-10-16 5:59 UTC (permalink / raw)
To: linux-btrfs
Balance RAID1/RAID10 mirror selection via plain round-robin scheduling. This should roughly double throughput for large reads.
Signed-off-by: Louis Jencka <louis@waffle.tech>
---
fs/btrfs/volumes.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 58b9c419a..45c581d46 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -333,6 +333,9 @@ struct list_head * __attribute_const__ btrfs_get_fs_uuids(void)
return &fs_uuids;
}
+/* Used for round-robin balancing RAID1/RAID10 reads. */
+atomic_t rr_counter = ATOMIC_INIT(0);
+
/*
* alloc_fs_devices - allocate struct btrfs_fs_devices
* @fsid: if not NULL, copy the UUID to fs_devices::fsid
@@ -5482,7 +5485,8 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info,
else
num_stripes = map->num_stripes;
- preferred_mirror = first + current->pid % num_stripes;
+ preferred_mirror = first +
+ (unsigned)atomic_inc_return(&rr_counter) % num_stripes;
if (dev_replace_is_ongoing &&
fs_info->dev_replace.cont_reading_from_srcdev_mode ==
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH] btrfs: balance RAID1/RAID10 mirror selection 2020-10-16 5:59 [PATCH] btrfs: balance RAID1/RAID10 mirror selection louis @ 2020-10-16 7:15 ` Nikolay Borisov 2020-10-16 7:29 ` Qu Wenruo 2020-10-16 16:18 ` louis 2020-10-16 17:21 ` waxhead 2020-10-23 7:38 ` Wang Yugui 2 siblings, 2 replies; 12+ messages in thread From: Nikolay Borisov @ 2020-10-16 7:15 UTC (permalink / raw) To: louis, linux-btrfs On 16.10.20 г. 8:59 ч., louis@waffle.tech wrote: > Balance RAID1/RAID10 mirror selection via plain round-robin scheduling. This should roughly double throughput for large reads. > > Signed-off-by: Louis Jencka <louis@waffle.tech> Can you show numbers substantiating your claims? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] btrfs: balance RAID1/RAID10 mirror selection 2020-10-16 7:15 ` Nikolay Borisov @ 2020-10-16 7:29 ` Qu Wenruo 2020-10-16 17:28 ` louis 2020-10-16 16:18 ` louis 1 sibling, 1 reply; 12+ messages in thread From: Qu Wenruo @ 2020-10-16 7:29 UTC (permalink / raw) To: Nikolay Borisov, louis, linux-btrfs On 2020/10/16 下午3:15, Nikolay Borisov wrote: > > > On 16.10.20 г. 8:59 ч., louis@waffle.tech wrote: >> Balance RAID1/RAID10 mirror selection via plain round-robin scheduling. This should roughly double throughput for large reads. >> >> Signed-off-by: Louis Jencka <louis@waffle.tech> > > Can you show numbers substantiating your claims? > And isn't this related to the read policy? IIRC some patches are floating in the mail list to provide the framework of how the read policy should work. Those patches aren't yet merged? Thanks, Qu ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] btrfs: balance RAID1/RAID10 mirror selection 2020-10-16 7:29 ` Qu Wenruo @ 2020-10-16 17:28 ` louis 2020-10-18 8:16 ` Anand Jain 0 siblings, 1 reply; 12+ messages in thread From: louis @ 2020-10-16 17:28 UTC (permalink / raw) To: Qu Wenruo, Nikolay Borisov, linux-btrfs, Anand Jain October 16, 2020 1:29 AM, "Qu Wenruo" <quwenruo.btrfs@gmx.com> wrote: > On 2020/10/16 下午3:15, Nikolay Borisov wrote: > >> On 16.10.20 г. 8:59 ч., louis@waffle.tech wrote: >>> Balance RAID1/RAID10 mirror selection via plain round-robin scheduling. This should roughly double >>> throughput for large reads. >>> >>> Signed-off-by: Louis Jencka <louis@waffle.tech> >> >> Can you show numbers substantiating your claims? > > And isn't this related to the read policy? IIRC some patches are > floating in the mail list to provide the framework of how the read > policy should work. > > Those patches aren't yet merged? > > Thanks, > Qu Oh, I hadn't seen the read-policy framework thread. Within that, this could be a new policy or a replacement for BTRFS_READ_POLICY_PID. Is work ongoing for those patches? Louis ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] btrfs: balance RAID1/RAID10 mirror selection 2020-10-16 17:28 ` louis @ 2020-10-18 8:16 ` Anand Jain 0 siblings, 0 replies; 12+ messages in thread From: Anand Jain @ 2020-10-18 8:16 UTC (permalink / raw) To: louis, Qu Wenruo, Nikolay Borisov, linux-btrfs On 17/10/20 1:28 am, louis@waffle.tech wrote: > October 16, 2020 1:29 AM, "Qu Wenruo" <quwenruo.btrfs@gmx.com> wrote: > >> On 2020/10/16 下午3:15, Nikolay Borisov wrote: >> >>> On 16.10.20 г. 8:59 ч., louis@waffle.tech wrote: >>>> Balance RAID1/RAID10 mirror selection via plain round-robin scheduling. This should roughly double >>>> throughput for large reads. >>>> >>>> Signed-off-by: Louis Jencka <louis@waffle.tech> >>> >>> Can you show numbers substantiating your claims? >> >> And isn't this related to the read policy? IIRC some patches are >> floating in the mail list to provide the framework of how the read >> policy should work. >> >> Those patches aren't yet merged? >> >> Thanks, >> Qu > > Oh, I hadn't seen the read-policy framework thread. Within that, this could be > a new policy or a replacement for BTRFS_READ_POLICY_PID. Is work ongoing for > those patches? > Yes. Those patches are just few days away please stayed tuned. Thanks, Anand > Louis > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] btrfs: balance RAID1/RAID10 mirror selection 2020-10-16 7:15 ` Nikolay Borisov 2020-10-16 7:29 ` Qu Wenruo @ 2020-10-16 16:18 ` louis 1 sibling, 0 replies; 12+ messages in thread From: louis @ 2020-10-16 16:18 UTC (permalink / raw) To: Nikolay Borisov, linux-btrfs October 16, 2020 1:15 AM, "Nikolay Borisov" <nborisov@suse.com> wrote: > On 16.10.20 г. 8:59 ч., louis@waffle.tech wrote: > >> Balance RAID1/RAID10 mirror selection via plain round-robin scheduling. This should roughly double >> throughput for large reads. >> >> Signed-off-by: Louis Jencka <louis@waffle.tech> > > Can you show numbers substantiating your claims? Sure thing. Below are the results from some tests I've run using a Debian 10 VM. It has two 10GiB disks attached which are each independently limited to 100MB/s total IO (using libvirt's iotune feature). They've been used to create the RAID1 volume mounted at /mnt. I've truncated the fio output to just show the high-level stats. Reading a large file performs twice as well with the patch applied (203 MB/s vs 101 MB/s). Writing a large file, and the random read-write tests, look like they perform roughly the same to me. Louis --- Without patch ============= louis@debian:/mnt$ uname -a Linux debian 4.19.0-11-amd64 #1 SMP Debian 4.19.146-1 (2020-09-17) x86_64 GNU/Linux louis@debian:/mnt$ btrfs fi df /mnt Data, RAID1: total=7.00GiB, used=2.00MiB System, RAID1: total=8.00MiB, used=16.00KiB Metadata, RAID1: total=1.00GiB, used=688.00KiB GlobalReserve, single: total=29.19MiB, used=352.00KiB louis@debian:/mnt$ dd if=/dev/urandom of=/mnt/test bs=1M count=1024 conv=fdatasync 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 12.928 s, 83.1 MB/s louis@debian:/mnt$ dd if=/mnt/test of=/dev/null 2097152+0 records in 2097152+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 10.6403 s, 101 MB/s louis@debian:/mnt$ fio --name=random-rw --ioengine=posixaio --rw=randrw --bs=4k --numjobs=2 --size=2g --iodepth=1 --runtime=60 --time_based --end_fsync=1 random-rw: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1 ... Run status group 0 (all jobs): READ: bw=15.8MiB/s (16.5MB/s), 7459KiB/s-8671KiB/s (7638kB/s-8879kB/s), io=964MiB (1010MB), run=61179-61179msec WRITE: bw=15.8MiB/s (16.6MB/s), 7490KiB/s-8682KiB/s (7670kB/s-8890kB/s), io=966MiB (1013MB), run=61179-61179msec With patch ========== louis@debian:/mnt$ uname -a Linux debian 4.19.0-11-amd64 #1 SMP Debian 4.19.146-1a~test (2020-10-15) x86_64 GNU/Linux louis@debian:/mnt$ dd if=/dev/urandom of=/mnt/test bs=1M count=1024 conv=fdatasync 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 11.8067 s, 90.9 MB/s louis@debian:/mnt$ dd if=/mnt/test of=/dev/null 2097152+0 records in 2097152+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 5.28642 s, 203 MB/s louis@debian:/mnt$ fio --name=random-rw --ioengine=posixaio --rw=randrw --bs=4k --numjobs=2 --size=2g --iodepth=1 --runtime=60 --time_based --end_fsync=1 random-rw: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1 ... Run status group 0 (all jobs): READ: bw=16.5MiB/s (17.3MB/s), 8217KiB/s-8652KiB/s (8414kB/s-8860kB/s), io=1025MiB (1074MB), run=62202-62202msec WRITE: bw=16.5MiB/s (17.3MB/s), 8218KiB/s-8698KiB/s (8415kB/s-8907kB/s), io=1028MiB (1077MB), run=62202-62202msec ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] btrfs: balance RAID1/RAID10 mirror selection 2020-10-16 5:59 [PATCH] btrfs: balance RAID1/RAID10 mirror selection louis 2020-10-16 7:15 ` Nikolay Borisov @ 2020-10-16 17:21 ` waxhead 2020-10-23 7:38 ` Wang Yugui 2 siblings, 0 replies; 12+ messages in thread From: waxhead @ 2020-10-16 17:21 UTC (permalink / raw) To: louis, linux-btrfs louis@waffle.tech wrote: > Balance RAID1/RAID10 mirror selection via plain round-robin scheduling. This should roughly double throughput for large reads. > > Signed-off-by: Louis Jencka <louis@waffle.tech> > --- > fs/btrfs/volumes.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 58b9c419a..45c581d46 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -333,6 +333,9 @@ struct list_head * __attribute_const__ btrfs_get_fs_uuids(void) > return &fs_uuids; > } > > +/* Used for round-robin balancing RAID1/RAID10 reads. */ > +atomic_t rr_counter = ATOMIC_INIT(0); > + > /* > * alloc_fs_devices - allocate struct btrfs_fs_devices > * @fsid: if not NULL, copy the UUID to fs_devices::fsid > @@ -5482,7 +5485,8 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info, > else > num_stripes = map->num_stripes; > > - preferred_mirror = first + current->pid % num_stripes; > + preferred_mirror = first + > + (unsigned)atomic_inc_return(&rr_counter) % num_stripes; > > if (dev_replace_is_ongoing && > fs_info->dev_replace.cont_reading_from_srcdev_mode == > I am just a regular user of BTRFS, but the btrfs_fs_info structure contains a substructure (dirty_metadata_bytes). Assuming that writing data leads to a dirty metadata bytes being set to non-zero would not something along the lines of this potentially be better? if(fs_info->dirty_metadata_bytes) atomic_inc_return(&rr_counter); prefered_mirror = first + (rr_counter+current->pid) % num_stripes; My knowledge of BTRFS code is near zero and I should know better than posting "code" here, but I can't resist getting my point across. There must be a simple way to improve slightly over a regular rr scheme. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] btrfs: balance RAID1/RAID10 mirror selection 2020-10-16 5:59 [PATCH] btrfs: balance RAID1/RAID10 mirror selection louis 2020-10-16 7:15 ` Nikolay Borisov 2020-10-16 17:21 ` waxhead @ 2020-10-23 7:38 ` Wang Yugui 2020-10-23 7:42 ` Nikolay Borisov 2020-10-27 14:26 ` Wang Yugui 2 siblings, 2 replies; 12+ messages in thread From: Wang Yugui @ 2020-10-23 7:38 UTC (permalink / raw) To: louis; +Cc: linux-btrfs Hi, Louis Jencka Can we move 'atomic_t rr_counter' into 'struct btrfs_fs_info' to support multiple mounted btrfs filesystem? Although 'readmirror feature (read_policy sysfs)' is talked about, round-robin is a replacement for BTRFS_READ_POLICY_PID in most case, we no longer need BTRFS_READ_POLICY_PID ? Best Regards Wang Yugui (wangyugui@e16-tech.com) 2020/10/23 > Balance RAID1/RAID10 mirror selection via plain round-robin scheduling. This should roughly double throughput for large reads. > > Signed-off-by: Louis Jencka <louis@waffle.tech> > --- > fs/btrfs/volumes.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 58b9c419a..45c581d46 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -333,6 +333,9 @@ struct list_head * __attribute_const__ btrfs_get_fs_uuids(void) > return &fs_uuids; > } > > +/* Used for round-robin balancing RAID1/RAID10 reads. */ > +atomic_t rr_counter = ATOMIC_INIT(0); > + > /* > * alloc_fs_devices - allocate struct btrfs_fs_devices > * @fsid: if not NULL, copy the UUID to fs_devices::fsid > @@ -5482,7 +5485,8 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info, > else > num_stripes = map->num_stripes; > > - preferred_mirror = first + current->pid % num_stripes; > + preferred_mirror = first + > + (unsigned)atomic_inc_return(&rr_counter) % num_stripes; > > if (dev_replace_is_ongoing && > fs_info->dev_replace.cont_reading_from_srcdev_mode == ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] btrfs: balance RAID1/RAID10 mirror selection 2020-10-23 7:38 ` Wang Yugui @ 2020-10-23 7:42 ` Nikolay Borisov 2020-11-02 19:29 ` louis 2020-10-27 14:26 ` Wang Yugui 1 sibling, 1 reply; 12+ messages in thread From: Nikolay Borisov @ 2020-10-23 7:42 UTC (permalink / raw) To: Wang Yugui, louis; +Cc: linux-btrfs On 23.10.20 г. 10:38 ч., Wang Yugui wrote: > Hi, Louis Jencka > > Can we move 'atomic_t rr_counter' into 'struct btrfs_fs_info' to > support multiple mounted btrfs filesystem? > And introduce constant cache line pings for every read. This thing needs to be tested under load with perf to see what kind of overhead the shared atomic_t counter adds. My hunch is this should really be a per-cpu variable. <snip> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] btrfs: balance RAID1/RAID10 mirror selection 2020-10-23 7:42 ` Nikolay Borisov @ 2020-11-02 19:29 ` louis 2020-11-02 19:36 ` louis 0 siblings, 1 reply; 12+ messages in thread From: louis @ 2020-11-02 19:29 UTC (permalink / raw) To: Nikolay Borisov, Wang Yugui; +Cc: linux-btrfs Good points. I'll move it into 'struct btrfs_fs_info', look at making it a per-cpu variable, and do some testing with perf (later this week hopefully). Louis October 23, 2020 1:42 AM, "Nikolay Borisov" <nborisov@suse.com> wrote: > On 23.10.20 г. 10:38 ч., Wang Yugui wrote: > >> Hi, Louis Jencka >> >> Can we move 'atomic_t rr_counter' into 'struct btrfs_fs_info' to >> support multiple mounted btrfs filesystem? > > And introduce constant cache line pings for every read. This thing needs > to be tested under load with perf to see what kind of overhead the > shared atomic_t counter adds. My hunch is this should really be a > per-cpu variable. > > <snip> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] btrfs: balance RAID1/RAID10 mirror selection 2020-11-02 19:29 ` louis @ 2020-11-02 19:36 ` louis 0 siblings, 0 replies; 12+ messages in thread From: louis @ 2020-11-02 19:36 UTC (permalink / raw) Cc: linux-btrfs Ha nevermind, looks like Anand Jain beat me to the punch. Louis November 2, 2020 12:29 PM, louis@waffle.tech wrote: > Good points. I'll move it into 'struct btrfs_fs_info', look at making it a per-cpu variable, and do > some testing with perf (later this week hopefully). > > Louis > > October 23, 2020 1:42 AM, "Nikolay Borisov" <nborisov@suse.com> wrote: > >> On 23.10.20 г. 10:38 ч., Wang Yugui wrote: >> >>> Hi, Louis Jencka >>> >>> Can we move 'atomic_t rr_counter' into 'struct btrfs_fs_info' to >>> support multiple mounted btrfs filesystem? >> >> And introduce constant cache line pings for every read. This thing needs >> to be tested under load with perf to see what kind of overhead the >> shared atomic_t counter adds. My hunch is this should really be a >> per-cpu variable. >> >> <snip> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] btrfs: balance RAID1/RAID10 mirror selection 2020-10-23 7:38 ` Wang Yugui 2020-10-23 7:42 ` Nikolay Borisov @ 2020-10-27 14:26 ` Wang Yugui 1 sibling, 0 replies; 12+ messages in thread From: Wang Yugui @ 2020-10-27 14:26 UTC (permalink / raw) To: louis, linux-btrfs; +Cc: anand.jain Hi, Louis Jencka Cc: Anand Jain Maybe we still need BTRFS_READ_POLICY_PID because of readahead. There are readahead inside OS and readahead inside some disk. For most SSD/SAS and SSD/SATA, there seems readahead inside the disk. But for some SSD/NVMe, there seems NO readahead inside the disk. BTRFS_READ_POLICY_PID cooperates readahead better in some case. Best Regards Wang Yugui (wangyugui@e16-tech.com) 2020/10/27 > Hi, Louis Jencka > > Can we move 'atomic_t rr_counter' into 'struct btrfs_fs_info' to > support multiple mounted btrfs filesystem? > > Although 'readmirror feature (read_policy sysfs)' is talked about, > round-robin is a replacement for BTRFS_READ_POLICY_PID in most case, > we no longer need BTRFS_READ_POLICY_PID ? > > Best Regards > Wang Yugui (wangyugui@e16-tech.com) > 2020/10/23 > > > Balance RAID1/RAID10 mirror selection via plain round-robin scheduling. This should roughly double throughput for large reads. > > > > Signed-off-by: Louis Jencka <louis@waffle.tech> > > --- > > fs/btrfs/volumes.c | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > > index 58b9c419a..45c581d46 100644 > > --- a/fs/btrfs/volumes.c > > +++ b/fs/btrfs/volumes.c > > @@ -333,6 +333,9 @@ struct list_head * __attribute_const__ btrfs_get_fs_uuids(void) > > return &fs_uuids; > > } > > > > +/* Used for round-robin balancing RAID1/RAID10 reads. */ > > +atomic_t rr_counter = ATOMIC_INIT(0); > > + > > /* > > * alloc_fs_devices - allocate struct btrfs_fs_devices > > * @fsid: if not NULL, copy the UUID to fs_devices::fsid > > @@ -5482,7 +5485,8 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info, > > else > > num_stripes = map->num_stripes; > > > > - preferred_mirror = first + current->pid % num_stripes; > > + preferred_mirror = first + > > + (unsigned)atomic_inc_return(&rr_counter) % num_stripes; > > > > if (dev_replace_is_ongoing && > > fs_info->dev_replace.cont_reading_from_srcdev_mode == > ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2020-11-02 19:36 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-10-16 5:59 [PATCH] btrfs: balance RAID1/RAID10 mirror selection louis 2020-10-16 7:15 ` Nikolay Borisov 2020-10-16 7:29 ` Qu Wenruo 2020-10-16 17:28 ` louis 2020-10-18 8:16 ` Anand Jain 2020-10-16 16:18 ` louis 2020-10-16 17:21 ` waxhead 2020-10-23 7:38 ` Wang Yugui 2020-10-23 7:42 ` Nikolay Borisov 2020-11-02 19:29 ` louis 2020-11-02 19:36 ` louis 2020-10-27 14:26 ` Wang Yugui
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox