public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [patch] raid-2.5.1-I7
@ 2001-12-18  1:09 Ingo Molnar
  2001-12-18  6:37 ` Fabbione
  2001-12-18 13:08 ` Roy Sigurd Karlsbakk
  0 siblings, 2 replies; 7+ messages in thread
From: Ingo Molnar @ 2001-12-18  1:09 UTC (permalink / raw)
  To: linux-raid; +Cc: linux-kernel, Jens Axboe, Linus Torvalds

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2244 bytes --]


the attached patch (against 2.5.1-final) includes the next round of RAID-1
improvements. First it completes the raid1.c cleanups i planned, and it
also adds a number of new RAID-1 performance features.

Changelog:

 - cleaned up the resync engine. It got much simpler and easier to
   maintain, while still saturating the disks. Resync doesnt get stuck
   under heavy load anymore. (this code can be switched to use explicit IO
   barrier requests in the future.)

 - rewrote the read balancing code to use three estimators: a per-array
   'next expected sequential IO' position, plus an IRQ-driven 'estimated
   disk head' position. The head position is now updated from all the IO
   completion routines: end of READ, end of WRITE, end of resync-READ, end
   of resync-WRITE. I've added per-disk tracking of pending requests,
   and the read balancer now detects idle disks and utilizes them before
   trying to read-balance between busy disks. I've also removed the
   sector_count limit that artificially switched the current disk. These
   changes make read balancing more accurate and more effective.

 - the old raid1 code used to have a limitation: it has always read from
   the first disk until the resync finished. Now the code will
   read-balance READ requests up to the resync boundary. This should
   further improve performance during resyncs.

 - added the 'idle IO resync' feature which we used to have in the 2.2
   patches, but via a different implementation that does not touch the
   generic block IO code. Resync happens only when there is no normal IO
   pending on the array. This feature should make resync a more seemless
   operation. Resync behavior can be tuned via the speed_limit_min and
   speed_limit_max sysctl tunables. Default for the minimum resync speed
   is 500 KB/sec, the maximum is 200 MB/sec.

 - fixed a number of sector_t <=> unsigned long bugs still left.

despite these new features added, the patch makes raid1.c 8% smaller, so
it's a win-win situation :-) I've tested the patch on UP and SMP as well,
and it's working just fine for me both in degraded-mode and normal mode,
but the usual warnings (do not use on production system, etc.) apply.

Comments, reports, suggestions welcome,

	Ingo

[-- Attachment #2: Type: APPLICATION/x-gzip, Size: 8832 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] raid-2.5.1-I7
  2001-12-18  1:09 [patch] raid-2.5.1-I7 Ingo Molnar
@ 2001-12-18  6:37 ` Fabbione
  2001-12-18 13:08 ` Roy Sigurd Karlsbakk
  1 sibling, 0 replies; 7+ messages in thread
From: Fabbione @ 2001-12-18  6:37 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

Ingo Molnar wrote:

> the attached patch (against 2.5.1-final) includes the next round of RAID-1
> improvements. First it completes the raid1.c cleanups i planned, and it
> also adds a number of new RAID-1 performance features.
> 
> 
> Comments, reports, suggestions welcome,
> 
> 	Ingo
> 

Hi Ingo,
		a simple question. Do You have any plan to port this
performance improvments in 2.4??

Thanks
Fabio

-- 
Debian GNU/Linux Unstable Kernel 2.4.15aa1
fabbione on irc.atdot.it #coredump #kchat | fabbione@fabbione.net


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] raid-2.5.1-I7
  2001-12-18  1:09 [patch] raid-2.5.1-I7 Ingo Molnar
  2001-12-18  6:37 ` Fabbione
@ 2001-12-18 13:08 ` Roy Sigurd Karlsbakk
  2001-12-18 17:13   ` Ingo Molnar
  1 sibling, 1 reply; 7+ messages in thread
From: Roy Sigurd Karlsbakk @ 2001-12-18 13:08 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-raid, linux-kernel, Jens Axboe, Linus Torvalds

Does this have anything to do with the bug I've reported about 2.4.x
slowing down i/o after heavy sequencial read-only from >=50 files
concurrently? (see BUG raid subsys)

roy

On Mon, 17 Dec 2001, Ingo Molnar wrote:

>
> the attached patch (against 2.5.1-final) includes the next round of RAID-1
> improvements. First it completes the raid1.c cleanups i planned, and it
> also adds a number of new RAID-1 performance features.
>
> Changelog:
>
>  - cleaned up the resync engine. It got much simpler and easier to
>    maintain, while still saturating the disks. Resync doesnt get stuck
>    under heavy load anymore. (this code can be switched to use explicit IO
>    barrier requests in the future.)
>
>  - rewrote the read balancing code to use three estimators: a per-array
>    'next expected sequential IO' position, plus an IRQ-driven 'estimated
>    disk head' position. The head position is now updated from all the IO
>    completion routines: end of READ, end of WRITE, end of resync-READ, end
>    of resync-WRITE. I've added per-disk tracking of pending requests,
>    and the read balancer now detects idle disks and utilizes them before
>    trying to read-balance between busy disks. I've also removed the
>    sector_count limit that artificially switched the current disk. These
>    changes make read balancing more accurate and more effective.
>
>  - the old raid1 code used to have a limitation: it has always read from
>    the first disk until the resync finished. Now the code will
>    read-balance READ requests up to the resync boundary. This should
>    further improve performance during resyncs.
>
>  - added the 'idle IO resync' feature which we used to have in the 2.2
>    patches, but via a different implementation that does not touch the
>    generic block IO code. Resync happens only when there is no normal IO
>    pending on the array. This feature should make resync a more seemless
>    operation. Resync behavior can be tuned via the speed_limit_min and
>    speed_limit_max sysctl tunables. Default for the minimum resync speed
>    is 500 KB/sec, the maximum is 200 MB/sec.
>
>  - fixed a number of sector_t <=> unsigned long bugs still left.
>
> despite these new features added, the patch makes raid1.c 8% smaller, so
> it's a win-win situation :-) I've tested the patch on UP and SMP as well,
> and it's working just fine for me both in degraded-mode and normal mode,
> but the usual warnings (do not use on production system, etc.) apply.
>
> Comments, reports, suggestions welcome,
>
> 	Ingo
>

--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA

Computers are like air conditioners.
They stop working when you open Windows.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] raid-2.5.1-I7
  2001-12-18 17:13   ` Ingo Molnar
@ 2001-12-18 15:31     ` Roy Sigurd Karlsbakk
  2001-12-19 15:09     ` [patch] raid-2.5.1-I8 Ingo Molnar
  1 sibling, 0 replies; 7+ messages in thread
From: Roy Sigurd Karlsbakk @ 2001-12-18 15:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ingo Molnar, linux-raid, linux-kernel, Jens Axboe, Linus Torvalds

> > Does this have anything to do with the bug I've reported about 2.4.x
> > slowing down i/o after heavy sequencial read-only from >=50 files
> > concurrently? (see BUG raid subsys)
>
> no. You have a RAID-0 array, while the patch i sent only affects RAID-1.
> It's very likely that 50 concurrent reads wont perform well on any device
> (RAID or standalone disk), i hope we can tackle workloads like that later
> in 2.5.

It really DOES perform well ... that is ... until it's used all the memory
and stops reading fast.


--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA

Computers are like air conditioners.
They stop working when you open Windows.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] raid-2.5.1-I7
  2001-12-18 13:08 ` Roy Sigurd Karlsbakk
@ 2001-12-18 17:13   ` Ingo Molnar
  2001-12-18 15:31     ` Roy Sigurd Karlsbakk
  2001-12-19 15:09     ` [patch] raid-2.5.1-I8 Ingo Molnar
  0 siblings, 2 replies; 7+ messages in thread
From: Ingo Molnar @ 2001-12-18 17:13 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk
  Cc: Ingo Molnar, linux-raid, linux-kernel, Jens Axboe, Linus Torvalds


On Tue, 18 Dec 2001, Roy Sigurd Karlsbakk wrote:

> Does this have anything to do with the bug I've reported about 2.4.x
> slowing down i/o after heavy sequencial read-only from >=50 files
> concurrently? (see BUG raid subsys)

no. You have a RAID-0 array, while the patch i sent only affects RAID-1.
It's very likely that 50 concurrent reads wont perform well on any device
(RAID or standalone disk), i hope we can tackle workloads like that later
in 2.5.

	Ingo


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [patch] raid-2.5.1-I8
  2001-12-18 17:13   ` Ingo Molnar
  2001-12-18 15:31     ` Roy Sigurd Karlsbakk
@ 2001-12-19 15:09     ` Ingo Molnar
  2001-12-19 15:17       ` [patch] raid-2.5.1-I9 Ingo Molnar
  1 sibling, 1 reply; 7+ messages in thread
From: Ingo Molnar @ 2001-12-19 15:09 UTC (permalink / raw)
  To: linux-raid, linux-kernel; +Cc: Jens Axboe, Linus Torvalds

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1799 bytes --]


the -I7 RAID patch did not apply to 2.5.1 cleanly - the attached -I8
version does.

	Ingo

-I7's Changelog:

 - cleaned up the resync engine. It got much simpler and easier to
   maintain, while still saturating the disks. Resync doesnt get stuck
   under heavy load anymore. (this code can be switched to use explicit IO
   barrier requests in the future.)

 - rewrote the read balancing code to use three estimators: a per-array
   'next expected sequential IO' position, plus an IRQ-driven 'estimated
   disk head' position. The head position is now updated from all the IO
   completion routines: end of READ, end of WRITE, end of resync-READ, end
   of resync-WRITE. I've added per-disk tracking of pending requests,
   and the read balancer now detects idle disks and utilizes them before
   trying to read-balance between busy disks. I've also removed the
   sector_count limit that artificially switched the current disk. These
   changes make read balancing more accurate and more effective.

 - the old raid1 code used to have a limitation: it has always read from
   the first disk until the resync finished. Now the code will
   read-balance READ requests up to the resync boundary. This should
   further improve performance during resyncs.

 - added the 'idle IO resync' feature which we used to have in the 2.2
   patches, but via a different implementation that does not touch the
   generic block IO code. Resync happens only when there is no normal IO
   pending on the array. This feature should make resync a more seemless
   operation. Resync behavior can be tuned via the speed_limit_min and
   speed_limit_max sysctl tunables. Default for the minimum resync speed
   is 500 KB/sec, the maximum is 200 MB/sec.

 - fixed a number of sector_t <=> unsigned long bugs still left.

[-- Attachment #2: Type: TEXT/PLAIN, Size: 5676 bytes --]

--- linux/include/linux/raid/raid1.h.orig	Mon Dec 17 14:22:38 2001
+++ linux/include/linux/raid/raid1.h	Wed Dec 19 13:56:30 2001
@@ -9,8 +9,8 @@
 	int		number;
 	int		raid_disk;
 	kdev_t		dev;
-	int		sect_limit;
-	int		head_position;
+	sector_t	head_position;
+	atomic_t	nr_pending;
 
 	/*
 	 * State bits:
@@ -31,23 +31,21 @@
 	int			raid_disks;
 	int			working_disks;
 	int			last_used;
-	sector_t		next_sect;
-	int			sect_count;
+	sector_t		next_seq_sect;
 	mdk_thread_t		*thread, *resync_thread;
 	int			resync_mirrors;
 	mirror_info_t		*spare;
 	spinlock_t		device_lock;
 
 	/* for use when syncing mirrors: */
-	unsigned long	start_active, start_ready,
-		start_pending, start_future;
-	int	cnt_done, cnt_active, cnt_ready,
-		cnt_pending, cnt_future;
-	int	phase;
-	int	window;
-	wait_queue_head_t	wait_done;
-	wait_queue_head_t	wait_ready;
-	spinlock_t		segment_lock;
+
+	spinlock_t		resync_lock;
+	int nr_pending;
+	int barrier;
+	sector_t		next_resync;
+
+	wait_queue_head_t	wait_idle;
+	wait_queue_head_t	wait_resume;
 
 	mempool_t *r1bio_pool;
 	mempool_t *r1buf_pool;
@@ -62,7 +60,8 @@
 #define mddev_to_conf(mddev) ((conf_t *) mddev->private)
 
 /*
- * this is our 'private' 'collective' RAID1 buffer head.
+ * this is our 'private' RAID1 bio.
+ *
  * it contains information about what kind of IO operations were started
  * for this RAID1 operation, and about their status:
  */
@@ -83,6 +82,7 @@
 	 * if the IO is in READ direction, then this bio is used:
 	 */
 	struct bio		*read_bio;
+	int			read_disk;
 	/*
 	 * if the IO is in WRITE direction, then multiple bios are used:
 	 */
@@ -94,5 +94,5 @@
 
 /* bits for r1bio.state */
 #define	R1BIO_Uptodate	1
-#define	R1BIO_SyncPhase	2
+
 #endif
--- linux/include/linux/raid/md_k.h.orig	Mon Dec 17 22:19:02 2001
+++ linux/include/linux/raid/md_k.h	Wed Dec 19 13:56:30 2001
@@ -240,7 +240,7 @@
 
 	int (*stop_resync)(mddev_t *mddev);
 	int (*restart_resync)(mddev_t *mddev);
-	int (*sync_request)(mddev_t *mddev, sector_t sector_nr);
+	int (*sync_request)(mddev_t *mddev, sector_t sector_nr, int go_faster);
 };
 
 
--- linux/drivers/md/raid1.c.orig	Wed Dec 19 13:56:23 2001
+++ linux/drivers/md/raid1.c	Wed Dec 19 13:56:30 2001
@@ -935,9 +935,9 @@
 	int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
 	r1bio_t * r1_bio = (r1bio_t *)(bio->bi_private);
 
-	check_all_w_bios_empty(r1_bio);
 	if (r1_bio->read_bio != bio)
 		BUG();
+	update_head_pos(r1_bio->read_disk, r1_bio);
 	/*
 	 * we have read a block, now it needs to be re-written,
 	 * or re-read if the read failed.
@@ -957,13 +957,21 @@
 	int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
 	r1bio_t * r1_bio = (r1bio_t *)(bio->bi_private);
 	mddev_t *mddev = r1_bio->mddev;
+	int i;
 
 	if (!uptodate)
 		md_error(mddev, bio->bi_dev);
 
+	for (i = 0; i < MD_SB_DISKS; i++)
+		if (r1_bio->write_bios[i] == bio) {
+			update_head_pos(i, r1_bio);
+			break;
+		}
+
 	if (atomic_dec_and_test(&r1_bio->remaining)) {
-		sync_request_done(r1_bio->sector, mddev_to_conf(mddev));
+		conf_t *conf = mddev_to_conf(mddev);
 		md_done_sync(mddev, r1_bio->master_bio->bi_size >> 9, uptodate);
+		resume_device(conf);
 		put_buf(r1_bio);
 	}
 	return 0;
@@ -1073,9 +1081,9 @@
 		r1_bio = list_entry(head->prev, r1bio_t, retry_list);
 		list_del(head->prev);
 		spin_unlock_irqrestore(&retry_list_lock, flags);
-		check_all_w_bios_empty(r1_bio);
 
 		mddev = r1_bio->mddev;
+		conf = mddev_to_conf(mddev);
 		if (mddev->sb_dirty) {
 			printk(KERN_INFO "raid1: dirty sb detected, updating.\n");
 			mddev->sb_dirty = 0;
--- linux/drivers/md/md.c.orig	Mon Dec 17 22:18:41 2001
+++ linux/drivers/md/md.c	Wed Dec 19 13:56:30 2001
@@ -66,7 +66,7 @@
 
 /*
  * Current RAID-1,4,5 parallel reconstruction 'guaranteed speed limit'
- * is 100 KB/sec, so the extra system load does not show up that much.
+ * is 1000 KB/sec, so the extra system load does not show up that much.
  * Increase it if you want to have more _guaranteed_ speed. Note that
  * the RAID driver will use the maximum available bandwith if the IO
  * subsystem is idle. There is also an 'absolute maximum' reconstruction
@@ -76,8 +76,8 @@
  * you can change it via /proc/sys/dev/raid/speed_limit_min and _max.
  */
 
-static int sysctl_speed_limit_min = 100;
-static int sysctl_speed_limit_max = 100000;
+static int sysctl_speed_limit_min = 1000;
+static int sysctl_speed_limit_max = 200000;
 
 static struct ctl_table_header *raid_table_header;
 
@@ -3336,7 +3336,7 @@
 int md_do_sync(mddev_t *mddev, mdp_disk_t *spare)
 {
 	mddev_t *mddev2;
-	unsigned int max_sectors, currspeed,
+	unsigned int max_sectors, currspeed = 0,
 		j, window, err, serialize;
 	unsigned long mark[SYNC_MARKS];
 	unsigned long mark_cnt[SYNC_MARKS];
@@ -3376,8 +3376,7 @@
 	max_sectors = mddev->sb->size << 1;
 
 	printk(KERN_INFO "md: syncing RAID array md%d\n", mdidx(mddev));
-	printk(KERN_INFO "md: minimum _guaranteed_ reconstruction speed: %d KB/sec/disc.\n",
-						sysctl_speed_limit_min);
+	printk(KERN_INFO "md: minimum _guaranteed_ reconstruction speed: %d KB/sec/disc.\n", sysctl_speed_limit_min);
 	printk(KERN_INFO "md: using maximum available idle IO bandwith "
 	       "(but not more than %d KB/sec) for reconstruction.\n",
 	       sysctl_speed_limit_max);
@@ -3409,7 +3408,7 @@
 	for (j = 0; j < max_sectors;) {
 		int sectors;
 
-		sectors = mddev->pers->sync_request(mddev, j);
+		sectors = mddev->pers->sync_request(mddev, j, currspeed < sysctl_speed_limit_min);
 		if (sectors < 0) {
 			err = sectors;
 			goto out;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [patch] raid-2.5.1-I9
  2001-12-19 15:09     ` [patch] raid-2.5.1-I8 Ingo Molnar
@ 2001-12-19 15:17       ` Ingo Molnar
  0 siblings, 0 replies; 7+ messages in thread
From: Ingo Molnar @ 2001-12-19 15:17 UTC (permalink / raw)
  To: linux-raid, linux-kernel; +Cc: Jens Axboe, Linus Torvalds

[-- Attachment #1: Type: TEXT/PLAIN, Size: 109 bytes --]


-I9: patch mixup again, this one actually does include the intended
changes. Brown paperbag time ...

	Ingo

[-- Attachment #2: Type: APPLICATION/x-gzip, Size: 8595 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2001-12-19 13:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-12-18  1:09 [patch] raid-2.5.1-I7 Ingo Molnar
2001-12-18  6:37 ` Fabbione
2001-12-18 13:08 ` Roy Sigurd Karlsbakk
2001-12-18 17:13   ` Ingo Molnar
2001-12-18 15:31     ` Roy Sigurd Karlsbakk
2001-12-19 15:09     ` [patch] raid-2.5.1-I8 Ingo Molnar
2001-12-19 15:17       ` [patch] raid-2.5.1-I9 Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox