* RAID1 ramdisk patch
@ 2005-09-05 0:46 Wilco Baan Hofman
2005-09-05 1:27 ` Neil Brown
0 siblings, 1 reply; 19+ messages in thread
From: Wilco Baan Hofman @ 2005-09-05 0:46 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 680 bytes --]
Hi all,
I have written a small patch for use with a HDD-backed ramdisk in the md
raid1 driver. The raid1 driver usually does read balancing on the disks,
but I feel that if it encounters a single ram disk in the array that
should be the preferred read disk. The application of this would be for
example a 2GB ram disk in raid1 with a 2GB partition, where the ram disk
is used for reading and both 'disks' used for writing.
Attached is a bit of code which checks for a ram-disk and sets it as
preferred disk. It also checks if the ram disk is in sync before
allowing the read.
PS. I am not this list, please CC me if a reply were to be made.
Regards,
Wilco Baan Hofman
[-- Attachment #2: syn-raid1ramdisk-20050905.patch --]
[-- Type: text/plain, Size: 2947 bytes --]
diff -urN linux-2.6.13-rc6.orig/include/linux/raid/raid1.h linux-2.6.13-rc6/include/linux/raid/raid1.h
--- linux-2.6.13-rc6.orig/include/linux/raid/raid1.h 2005-08-07 20:18:56.000000000 +0200
+++ linux-2.6.13-rc6/include/linux/raid/raid1.h 2005-09-04 11:41:24.000000000 +0200
@@ -32,6 +32,7 @@
int raid_disks;
int working_disks;
int last_used;
+ int preferred_read_disk;
sector_t next_seq_sect;
spinlock_t device_lock;
diff -urN linux-2.6.13-rc6.orig/drivers/md/raid1.c linux-2.6.13-rc6/drivers/md/raid1.c
--- linux-2.6.13-rc6.orig/drivers/md/raid1.c 2005-08-07 20:18:56.000000000 +0200
+++ linux-2.6.13-rc6/drivers/md/raid1.c 2005-09-05 01:54:26.000000000 +0200
@@ -21,6 +21,8 @@
* Additions to bitmap code, (C) 2003-2004 Paul Clements, SteelEye Technology:
* - persistent bitmap code
*
+ * Special handling of ramdisk (C) 2005 Wilco Baan Hofman <wilco@baanhofman.nl>
+ *
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2, or (at your option)
@@ -399,8 +401,6 @@
goto rb_out;
}
}
- disk = new_disk;
- /* now disk == new_disk == starting point for search */
/*
* Don't change to another disk for sequential reads:
@@ -409,7 +409,18 @@
goto rb_out;
if (this_sector == conf->mirrors[new_disk].head_position)
goto rb_out;
-
+
+ /* [SYN] If the preferred disk exists, return it */
+ if (conf->preferred_read_disk != -1 &&
+ (new_rdev=conf->mirrors[conf->preferred_read_disk].rdev) != NULL &&
+ new_rdev->in_sync) {
+ new_disk = conf->preferred_read_disk;
+ goto rb_out;
+ }
+
+ disk = new_disk;
+ /* now disk == new_disk == starting point for search */
+
current_distance = abs(this_sector - conf->mirrors[disk].head_position);
/* Find the disk whose head is closest */
@@ -1292,10 +1303,11 @@
static int run(mddev_t *mddev)
{
conf_t *conf;
- int i, j, disk_idx;
+ int i, j, disk_idx, ram_count;
mirror_info_t *disk;
mdk_rdev_t *rdev;
struct list_head *tmp;
+ char b[BDEVNAME_SIZE];
if (mddev->level != 1) {
printk("raid1: %s: raid level not set to mirroring (%d)\n",
@@ -1417,6 +1429,30 @@
mddev->queue->unplug_fn = raid1_unplug;
mddev->queue->issue_flush_fn = raid1_issue_flush;
+ /* [SYN] if there is a ram disk, that will be the preferred disk.
+ * .. unless there are multiple ram disks. */
+ conf->preferred_read_disk = -1;
+ for (i = 0,
+ ram_count = 0;
+ i < mddev->raid_disks;
+ i++) {
+
+ bdevname(conf->mirrors[i].rdev->bdev, b);
+ if (strncmp(b, "ram", 3) == 0) {
+ if (ram_count) {
+ conf->preferred_read_disk = -1;
+ break;
+ }
+ conf->preferred_read_disk = i;
+ ram_count++;
+ }
+ }
+ if (conf->preferred_read_disk >= 0) {
+ printk(KERN_INFO
+ "raid1: One ram disk (%s) found, setting it preferred read disk.\n", b);
+ }
+
+
return 0;
out_no_mem:
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: RAID1 ramdisk patch 2005-09-05 0:46 RAID1 ramdisk patch Wilco Baan Hofman @ 2005-09-05 1:27 ` Neil Brown 2005-09-05 7:40 ` Wilco Baan Hofman 2005-11-16 13:36 ` segfault mdadm --write-behind, 2.6.14-mm2 (was: Re: RAID1 ramdisk patch) Sander 0 siblings, 2 replies; 19+ messages in thread From: Neil Brown @ 2005-09-05 1:27 UTC (permalink / raw) To: Wilco Baan Hofman; +Cc: linux-kernel On Monday September 5, wilco@baanhofman.nl wrote: > Hi all, > > I have written a small patch for use with a HDD-backed ramdisk in the md > raid1 driver. The raid1 driver usually does read balancing on the disks, > but I feel that if it encounters a single ram disk in the array that > should be the preferred read disk. The application of this would be for > example a 2GB ram disk in raid1 with a 2GB partition, where the ram disk > is used for reading and both 'disks' used for writing. > > Attached is a bit of code which checks for a ram-disk and sets it as > preferred disk. It also checks if the ram disk is in sync before > allowing the read. Hi, equivalent functionality is now available in 2.6-mm and is referred to as 'write mostly'. If you use mdadm-2.0 and mark a device as --write-mostly, then all read requests will go to the other device(s) if possible,. e.g. mdadm --create /dev/md0 --level=1 --raid-disks=2 /dev/ramdisk \ --writemostly /dev/realdisk Does this suit your needs? You can also arrange for the write to the writemostly device to be 'write-behind' so that the filesystem doesn't wait for the write to complete. This can reduce write-latency (though not increase write throughput) at a very small cost of reliability (if the RAM dies, the disk may not be 100% up-to-date). NeilBrown ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: RAID1 ramdisk patch 2005-09-05 1:27 ` Neil Brown @ 2005-09-05 7:40 ` Wilco Baan Hofman 2005-11-16 13:36 ` segfault mdadm --write-behind, 2.6.14-mm2 (was: Re: RAID1 ramdisk patch) Sander 1 sibling, 0 replies; 19+ messages in thread From: Wilco Baan Hofman @ 2005-09-05 7:40 UTC (permalink / raw) To: linux-kernel Neil Brown wrote: >On Monday September 5, wilco@baanhofman.nl wrote: > > >>Hi all, >> >>I have written a small patch for use with a HDD-backed ramdisk in the md >>raid1 driver. The raid1 driver usually does read balancing on the disks, >>but I feel that if it encounters a single ram disk in the array that >>should be the preferred read disk. The application of this would be for >>example a 2GB ram disk in raid1 with a 2GB partition, where the ram disk >>is used for reading and both 'disks' used for writing. >> >>Attached is a bit of code which checks for a ram-disk and sets it as >>preferred disk. It also checks if the ram disk is in sync before >>allowing the read. >> >> > >Hi, > equivalent functionality is now available in 2.6-mm and is referred > to as 'write mostly'. > If you use mdadm-2.0 and mark a device as --write-mostly, then all > read requests will go to the other device(s) if possible,. > e.g. > mdadm --create /dev/md0 --level=1 --raid-disks=2 /dev/ramdisk \ > --writemostly /dev/realdisk > > Does this suit your needs? > > You can also arrange for the write to the writemostly device to be > 'write-behind' so that the filesystem doesn't wait for the write to > complete. This can reduce write-latency (though not increase write > throughput) at a very small cost of reliability (if the RAM dies, the > disk may not be 100% up-to-date). > >NeilBrown > > > I was looking for that (but couldn't find it).. At this point I don't see why it wouldn't, if that also syncs from the partition then it's basically the same functionality, but written from a different perspective. To use it I'll have to deviate from stock linux and use a non-packaged mdadm, but that is better than applying my patch every kernel update ;-) Thanks, I'll look into it. Wilco Baan Hofman ^ permalink raw reply [flat|nested] 19+ messages in thread
* segfault mdadm --write-behind, 2.6.14-mm2 (was: Re: RAID1 ramdisk patch) 2005-09-05 1:27 ` Neil Brown 2005-09-05 7:40 ` Wilco Baan Hofman @ 2005-11-16 13:36 ` Sander 2005-11-16 22:20 ` Andrew Morton 1 sibling, 1 reply; 19+ messages in thread From: Sander @ 2005-11-16 13:36 UTC (permalink / raw) To: Neil Brown; +Cc: linux-kernel Neil Brown wrote (ao): > If you use mdadm-2.0 and mark a device as --write-mostly, then all > read requests will go to the other device(s) if possible,. > e.g. > mdadm --create /dev/md0 --level=1 --raid-disks=2 /dev/ramdisk \ > --writemostly /dev/realdisk > > Does this suit your needs? > > You can also arrange for the write to the writemostly device to be > 'write-behind' so that the filesystem doesn't wait for the write to > complete. This can reduce write-latency (though not increase write > throughput) at a very small cost of reliability (if the RAM dies, the > disk may not be 100% up-to-date). With 2.6.14-mm2 (x86) and mdadm 2.1 I get a Segmentation fault when I try this: mdadm -C /dev/md1 -l1 -n2 --bitmap=/storage/md1.bitmap /dev/loop0 \ --write-behind /dev/loop1 loop0 is attached to a file on tmpfs, and loop1 is attached to a file on a lvm2 volume (reiser4, if that matters). I can create and use the array with: mdadm -C /dev/md1 -l1 -n2 /dev/loop0 /dev/loop1 and mdadm -C /dev/md1 -l1 -n2 /dev/loop0 --write-mostly /dev/loop1 mdadm is compiled with: gcc (GCC) 4.0.3 20051023 (prerelease) (Debian 4.0.2-3) Can/should I provide more info? With kind regards, Sander This is what I get if I reboot, create the images with dd, attach them with losetup and try to create the array with mdadm: [42949575.730000] loop: loaded (max 8 devices) [42949584.840000] md: bind<loop0> [42949584.840000] md: bind<loop1> [42949584.840000] md: md1: raid array is not clean -- starting background reconstruction [42949584.840000] md1: bitmap file is out of date (0 < 1) -- forcing full recovery [42949584.840000] md1: bitmap file is out of date, doing full recovery [42949584.840000] Unable to handle kernel NULL pointer dereference at virtual address 00000008 [42949584.840000] printing eip: [42949584.840000] c01c33dd [42949584.840000] *pde = 00000000 [42949584.840000] Oops: 0000 [#1] [42949584.840000] last sysfs file: /devices/pci0000:00/0000:00:11.0/i2c-0/name [42949584.840000] Modules linked in: loop dm_mod i2c_viapro i2c_core [42949584.840000] CPU: 0 [42949584.840000] EIP: 0060:[<c01c33dd>] Not tainted VLI [42949584.840000] EFLAGS: 00010286 (2.6.14-mm2) [42949584.840000] EIP is at prepare_write_unix_file+0x1d/0xab [42949584.840000] eax: 00000000 ebx: c01c33c0 ecx: 00000000 edx: c104ce60 [42949584.840000] esi: c104ce60 edi: f2f2f4a0 ebp: 00000000 esp: c2d6bd90 [42949584.840000] ds: 007b es: 007b ss: 0068 [42949584.840000] Process mdadm (pid: 749, threadinfo=c2d6b000 task=c3784580) [42949584.840000] Stack: 30303034 00000000 c104ce60 c01c33c0 c104ce60 f2f2f4a0 00000001 c02b00f2 [42949584.840000] 00001000 00000f00 f2f2f4a0 c2674000 c104ce60 c02b1154 c03a97dc f7c278cc [42949584.840000] c2d6bddc c02b05b4 c03a975c f7c278cc 00000000 00000000 00000000 00031f20 [42949584.840000] Call Trace: [42949584.840000] [<c01c33c0>] prepare_write_unix_file+0x0/0xab [42949584.840000] [<c02b00f2>] write_page+0x52/0x140 [42949584.840000] [<c02b1154>] bitmap_init_from_disk+0x384/0x450 [42949584.840000] [<c02b05b4>] bitmap_read_sb+0x84/0x2f0 [42949584.840000] [<c02b21f3>] bitmap_create+0x1a3/0x2a0 [42949584.840000] [<c02ab95a>] do_md_run+0x2ba/0x500 [42949584.840000] [<c02ac8a7>] add_new_disk+0x157/0x3b0 [42949584.840000] [<c0179034>] mpage_writepages+0x124/0x3d0 [42949584.840000] [<c013c23e>] __pagevec_free+0x3e/0x60 [42949584.840000] [<c013eff9>] release_pages+0x29/0x160 [42949584.840000] [<c02adb81>] md_ioctl+0x5a1/0x630 [42949584.840000] [<c0137918>] find_get_pages+0x18/0x40 [42949584.840000] [<c02ad5e0>] md_ioctl+0x0/0x630 [42949584.840000] [<c01ede74>] blkdev_driver_ioctl+0x54/0x60 [42949584.840000] [<c01edfb4>] blkdev_ioctl+0x134/0x180 [42949584.840000] [<c015e158>] block_ioctl+0x18/0x20 [42949584.840000] [<c015e140>] block_ioctl+0x0/0x20 [42949584.840000] [<c01674ff>] do_ioctl+0x1f/0x70 [42949584.840000] [<c016769c>] vfs_ioctl+0x5c/0x1e0 [42949584.840000] [<c0156c91>] __fput+0xe1/0x140 [42949584.840000] [<c016785d>] sys_ioctl+0x3d/0x70 [42949584.840000] [<c0102f49>] syscall_call+0x7/0xb [42949584.840000] Code: 02 00 00 eb 89 89 f6 8d bc 27 00 00 00 00 83 ec 1c 89 5c 24 0c 89 7c 24 14 89 6c 24 18 89 c5 89 74 24 10 89 54 24 08 89 4c 24 04 <8b> 40 08 8b 40 08 8b 80 94 00 00 00 e8 92 20 fd ff 3d 18 fc ff [42949584.840000] -- Humilis IT Services and Solutions http://www.humilis.net ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: segfault mdadm --write-behind, 2.6.14-mm2 (was: Re: RAID1 ramdisk patch) 2005-11-16 13:36 ` segfault mdadm --write-behind, 2.6.14-mm2 (was: Re: RAID1 ramdisk patch) Sander @ 2005-11-16 22:20 ` Andrew Morton 2005-11-16 23:08 ` Neil Brown 2005-11-18 14:18 ` segfault mdadm --write-behind, 2.6.14-mm2 Vladimir V. Saveliev 0 siblings, 2 replies; 19+ messages in thread From: Andrew Morton @ 2005-11-16 22:20 UTC (permalink / raw) To: sander; +Cc: neilb, linux-kernel, reiserfs-dev Sander <sander@humilis.net> wrote: > > Neil Brown wrote (ao): > > If you use mdadm-2.0 and mark a device as --write-mostly, then all > > read requests will go to the other device(s) if possible,. > > e.g. > > mdadm --create /dev/md0 --level=1 --raid-disks=2 /dev/ramdisk \ > > --writemostly /dev/realdisk > > > > Does this suit your needs? > > > > You can also arrange for the write to the writemostly device to be > > 'write-behind' so that the filesystem doesn't wait for the write to > > complete. This can reduce write-latency (though not increase write > > throughput) at a very small cost of reliability (if the RAM dies, the > > disk may not be 100% up-to-date). > > With 2.6.14-mm2 (x86) and mdadm 2.1 I get a Segmentation fault when I > try this: It oopsed in reiser4. reiserfs-dev added to Cc... > mdadm -C /dev/md1 -l1 -n2 --bitmap=/storage/md1.bitmap /dev/loop0 \ > --write-behind /dev/loop1 > > loop0 is attached to a file on tmpfs, and loop1 is attached > to a file on a lvm2 volume (reiser4, if that matters). > > I can create and use the array with: > > mdadm -C /dev/md1 -l1 -n2 /dev/loop0 /dev/loop1 > > and > > mdadm -C /dev/md1 -l1 -n2 /dev/loop0 --write-mostly /dev/loop1 > > mdadm is compiled with: > gcc (GCC) 4.0.3 20051023 (prerelease) (Debian 4.0.2-3) > > Can/should I provide more info? > > With kind regards, Sander > > This is what I get if I reboot, create the images with dd, > attach them with losetup and try to create the array with mdadm: > > > [42949575.730000] loop: loaded (max 8 devices) > [42949584.840000] md: bind<loop0> > [42949584.840000] md: bind<loop1> > [42949584.840000] md: md1: raid array is not clean -- starting background reconstruction > [42949584.840000] md1: bitmap file is out of date (0 < 1) -- forcing full recovery > [42949584.840000] md1: bitmap file is out of date, doing full recovery > [42949584.840000] Unable to handle kernel NULL pointer dereference at virtual address 00000008 > [42949584.840000] printing eip: > [42949584.840000] c01c33dd > [42949584.840000] *pde = 00000000 > [42949584.840000] Oops: 0000 [#1] > [42949584.840000] last sysfs file: /devices/pci0000:00/0000:00:11.0/i2c-0/name > [42949584.840000] Modules linked in: loop dm_mod i2c_viapro i2c_core > [42949584.840000] CPU: 0 > [42949584.840000] EIP: 0060:[<c01c33dd>] Not tainted VLI > [42949584.840000] EFLAGS: 00010286 (2.6.14-mm2) > [42949584.840000] EIP is at prepare_write_unix_file+0x1d/0xab > [42949584.840000] eax: 00000000 ebx: c01c33c0 ecx: 00000000 edx: c104ce60 > [42949584.840000] esi: c104ce60 edi: f2f2f4a0 ebp: 00000000 esp: c2d6bd90 > [42949584.840000] ds: 007b es: 007b ss: 0068 > [42949584.840000] Process mdadm (pid: 749, threadinfo=c2d6b000 task=c3784580) > [42949584.840000] Stack: 30303034 00000000 c104ce60 c01c33c0 c104ce60 f2f2f4a0 00000001 c02b00f2 > [42949584.840000] 00001000 00000f00 f2f2f4a0 c2674000 c104ce60 c02b1154 c03a97dc f7c278cc > [42949584.840000] c2d6bddc c02b05b4 c03a975c f7c278cc 00000000 00000000 00000000 00031f20 > [42949584.840000] Call Trace: > [42949584.840000] [<c01c33c0>] prepare_write_unix_file+0x0/0xab > [42949584.840000] [<c02b00f2>] write_page+0x52/0x140 > [42949584.840000] [<c02b1154>] bitmap_init_from_disk+0x384/0x450 > [42949584.840000] [<c02b05b4>] bitmap_read_sb+0x84/0x2f0 > [42949584.840000] [<c02b21f3>] bitmap_create+0x1a3/0x2a0 > [42949584.840000] [<c02ab95a>] do_md_run+0x2ba/0x500 > [42949584.840000] [<c02ac8a7>] add_new_disk+0x157/0x3b0 > [42949584.840000] [<c0179034>] mpage_writepages+0x124/0x3d0 > [42949584.840000] [<c013c23e>] __pagevec_free+0x3e/0x60 > [42949584.840000] [<c013eff9>] release_pages+0x29/0x160 > [42949584.840000] [<c02adb81>] md_ioctl+0x5a1/0x630 > [42949584.840000] [<c0137918>] find_get_pages+0x18/0x40 > [42949584.840000] [<c02ad5e0>] md_ioctl+0x0/0x630 > [42949584.840000] [<c01ede74>] blkdev_driver_ioctl+0x54/0x60 > [42949584.840000] [<c01edfb4>] blkdev_ioctl+0x134/0x180 > [42949584.840000] [<c015e158>] block_ioctl+0x18/0x20 > [42949584.840000] [<c015e140>] block_ioctl+0x0/0x20 > [42949584.840000] [<c01674ff>] do_ioctl+0x1f/0x70 > [42949584.840000] [<c016769c>] vfs_ioctl+0x5c/0x1e0 > [42949584.840000] [<c0156c91>] __fput+0xe1/0x140 > [42949584.840000] [<c016785d>] sys_ioctl+0x3d/0x70 > [42949584.840000] [<c0102f49>] syscall_call+0x7/0xb > [42949584.840000] Code: 02 00 00 eb 89 89 f6 8d bc 27 00 00 00 00 83 ec 1c 89 5c 24 0c 89 7c 24 14 89 6c 24 18 89 c5 89 74 24 10 89 54 24 08 89 4c 24 04 <8b> 40 08 8b 40 08 8b 80 94 00 00 00 e8 92 20 fd ff 3d 18 fc ff > [42949584.840000] > > > -- > Humilis IT Services and Solutions > http://www.humilis.net > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: segfault mdadm --write-behind, 2.6.14-mm2 (was: Re: RAID1 ramdisk patch) 2005-11-16 22:20 ` Andrew Morton @ 2005-11-16 23:08 ` Neil Brown 2005-11-17 7:50 ` Sander 2005-11-18 14:18 ` segfault mdadm --write-behind, 2.6.14-mm2 Vladimir V. Saveliev 1 sibling, 1 reply; 19+ messages in thread From: Neil Brown @ 2005-11-16 23:08 UTC (permalink / raw) To: Andrew Morton; +Cc: sander, linux-kernel, reiserfs-dev On Wednesday November 16, akpm@osdl.org wrote: > Sander <sander@humilis.net> wrote: > > > > > > With 2.6.14-mm2 (x86) and mdadm 2.1 I get a Segmentation fault when I > > try this: > > It oopsed in reiser4. reiserfs-dev added to Cc... > Hmm... It appears that md/bitmap is calling prepare_write and commit_write with 'file' as NULL - this works for some filesystems, but not for reiser4. Does this patch help. Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./drivers/md/bitmap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c --- ./drivers/md/bitmap.c~current~ 2005-11-17 10:05:18.000000000 +1100 +++ ./drivers/md/bitmap.c 2005-11-17 10:05:40.000000000 +1100 @@ -326,9 +326,9 @@ static int write_page(struct bitmap *bit } } - ret = page->mapping->a_ops->prepare_write(NULL, page, 0, PAGE_SIZE); + ret = page->mapping->a_ops->prepare_write(bitmap->file, page, 0, PAGE_SIZE); if (!ret) - ret = page->mapping->a_ops->commit_write(NULL, page, 0, + ret = page->mapping->a_ops->commit_write(bitmap->file, page, 0, PAGE_SIZE); if (ret) { unlock_page(page); ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: segfault mdadm --write-behind, 2.6.14-mm2 (was: Re: RAID1 ramdisk patch) 2005-11-16 23:08 ` Neil Brown @ 2005-11-17 7:50 ` Sander 2005-11-17 10:12 ` Sander 0 siblings, 1 reply; 19+ messages in thread From: Sander @ 2005-11-17 7:50 UTC (permalink / raw) To: Neil Brown; +Cc: Andrew Morton, sander, linux-kernel, reiserfs-dev Neil Brown wrote (ao): > On Wednesday November 16, akpm@osdl.org wrote: > > Sander <sander@humilis.net> wrote: > > > With 2.6.14-mm2 (x86) and mdadm 2.1 I get a Segmentation fault when I > > > try this: > > > > It oopsed in reiser4. reiserfs-dev added to Cc... > > > > Hmm... It appears that md/bitmap is calling prepare_write and > commit_write with 'file' as NULL - this works for some filesystems, > but not for reiser4. > > Does this patch help. Something changed, but it didn't fix it it seems: # mdadm -C /dev/md1 --bitmap=/storage/raid1.bitmap -l1 -n2 /dev/loop0 --write-behind /dev/loop1 mdadm: RUN_ARRAY failed: No such file or directory (google didn't turn up the same error, but a lot without the 'No such file or directory') [42949645.530000] md: bind<loop0> [42949645.540000] md: bind<loop1> [42949645.540000] md: md1: raid array is not clean -- starting background reconstruction [42949645.540000] md1: bitmap file is out of date (0 < 1) -- forcing full recovery [42949645.540000] md1: bitmap file is out of date, doing full recovery [42949645.560000] md1: bitmap initialized from disk: read 0/7 pages, set 0 bits, status: 1 [42949645.560000] md1: failed to create bitmap (1) [42949645.560000] md: pers->run() failed ... [42949645.560000] md: md1 stopped. [42949645.560000] md: unbind<loop1> [42949645.560000] md: export_rdev(loop1) [42949645.560000] md: unbind<loop0> [42949645.560000] md: export_rdev(loop0) # ls -l /storage/raid1.bitmap -rw-r--r-- 1 root root 25856 Nov 17 08:37 /storage/raid1.bitmap (file is there, lets try again) ~# mdadm -C /dev/md1 --bitmap=/storage/raid1.bitmap -l1 -n2 /dev/loop0 --write-behind /dev/loop1 mdadm: /dev/loop0 appears to be part of a raid array: level=raid1 devices=2 ctime=Thu Nov 17 08:37:58 2005 mdadm: /dev/loop1 appears to be part of a raid array: level=raid1 devices=2 ctime=Thu Nov 17 08:37:58 2005 Continue creating array? yes mdadm: bitmap file /storage/raid1.bitmap already exists, use --force to overwrite (ok, try with new bitmapfile) # mdadm -C /dev/md1 --bitmap=/storage/raid.bitmap -l1 -n2 /dev/loop0 --write-behind /dev/loop1 mdadm: /dev/loop0 appears to be part of a raid array: level=raid1 devices=2 ctime=Thu Nov 17 08:37:58 2005 mdadm: /dev/loop1 appears to be part of a raid array: level=raid1 devices=2 ctime=Thu Nov 17 08:37:58 2005 Continue creating array? yes mdadm: RUN_ARRAY failed: No such file or directory (doesn't work, lets force the first one) # mdadm -C /dev/md1 --bitmap=/storage/raid1.bitmap -f -l1 -n2 /dev/loop0 --write-behind /dev/loop1 mdadm: /dev/loop0 appears to be part of a raid array: level=raid1 devices=2 ctime=Thu Nov 17 08:40:50 2005 mdadm: /dev/loop1 appears to be part of a raid array: level=raid1 devices=2 ctime=Thu Nov 17 08:40:50 2005 Continue creating array? yes Segmentation fault For some reason, the dmesg is quite a bit longer now. [42949831.700000] Bad page state at free_hot_cold_page (in process 'mdadm', page c1043220) [42949831.700000] flags:0x80000001 mapping:00000000 mapcount:0 count:0 [42949831.700000] Backtrace: [42949831.700000] [<c013b320>] bad_page+0x70/0xb0 [42949831.700000] [<c013bab1>] free_hot_cold_page+0x51/0xd0 [42949831.700000] [<c013f5da>] truncate_inode_pages_range+0x11a/0x310 [42949831.700000] [<c01a2ac0>] reiser4_invalidate_pages+0x90/0xc0 [42949831.700000] [<c01ba5ed>] kill_hook_extent+0x17d/0x5b0 [42949831.700000] [<c01ac29c>] plugin_by_unsafe_id+0x1c/0x110 [42949831.700000] [<c01ba470>] kill_hook_extent+0x0/0x5b0 [42949831.700000] [<c01cd7fd>] call_kill_hooks+0x9d/0xc0 [42949831.700000] [<c01cd8f0>] kill_head+0x0/0x40 [42949831.700000] [<c01cdf76>] prepare_for_compact+0x536/0x540 [42949831.700000] [<c0192a0e>] lock_tail+0x1e/0x40 [42949831.700000] [<c01ac29c>] plugin_by_unsafe_id+0x1c/0x110 [42949831.700000] [<c01cd820>] kill_units+0x0/0x80 [42949831.700000] [<c01cd8f0>] kill_head+0x0/0x40 [42949831.700000] [<c0192933>] longterm_unlock_znode+0xa3/0x160 [42949831.700000] [<c0192bf3>] longterm_lock_znode+0x163/0x250 [42949831.700000] [<c018ce4b>] jload_gfp+0x5b/0x140 [42949831.700000] [<c01cdfb1>] kill_node40+0x31/0xc0 [42949831.700000] [<c0191a88>] carry_cut+0x48/0x60 [42949831.700000] [<c018f458>] carry_on_level+0x38/0xc0 [42949831.700000] [<c018f302>] carry+0x82/0x1a0 [42949831.700000] [<c018f704>] add_carry+0x24/0x40 [42949831.700000] [<c018f51d>] post_carry+0x3d/0xa0 [42949831.710000] [<c0194886>] kill_node_content+0xf6/0x160 [42949831.710000] [<c0194e39>] cut_tree_worker_common+0x159/0x350 [42949831.710000] [<c0194ce0>] cut_tree_worker_common+0x0/0x350 [42949831.710000] [<c0195155>] cut_tree_object+0x125/0x240 [42949831.710000] [<c0196d29>] reiser4_grab_reserved+0x49/0x190 [42949831.710000] [<c018d04f>] jrelse+0xf/0x20 [42949831.710000] [<c01bfc81>] cut_file_items+0xb1/0x180 [42949831.710000] [<c01a0108>] add_empty_leaf+0xa8/0x220 [42949831.710000] [<c01bfdab>] shorten_file+0x4b/0x260 [42949831.710000] [<c01bfb40>] update_file_size+0x0/0x90 [42949831.710000] [<c01c2f03>] setattr_truncate+0x73/0x210 [42949831.710000] [<c01ad384>] permission_common+0x24/0x40 [42949831.710000] [<c01ad360>] permission_common+0x0/0x40 [42949831.710000] [<c0162b78>] permission+0x48/0x90 [42949831.710000] [<c0163119>] __link_path_walk+0x89/0xc40 [42949831.710000] [<c01c30fe>] setattr_unix_file+0x5e/0xc0 [42949831.710000] [<c016f58f>] notify_change+0xcf/0x2d5 [42949831.710000] [<c0163d3f>] link_path_walk+0x6f/0xe0 [42949831.710000] [<c0153e9b>] do_truncate+0x4b/0x70 [42949831.710000] [<c0162b78>] permission+0x48/0x90 [42949831.710000] [<c0164704>] may_open+0x184/0x1d0 [42949831.710000] [<c01647d5>] open_namei+0x85/0x560 [42949831.710000] [<c0154fe2>] filp_open+0x22/0x50 [42949831.710000] [<c01551ad>] get_unused_fd+0x4d/0xb0 [42949831.710000] [<c01552c1>] do_sys_open+0x41/0xd0 [42949831.710000] [<c0102f49>] syscall_call+0x7/0xb [42949831.710000] Trying to fix it up, but a reboot is needed [42949831.710000] ------------[ cut here ]------------ [42949831.710000] kernel BUG at mm/filemap.c:480! [42949831.710000] invalid operand: 0000 [#1] [42949831.710000] last sysfs file: /devices/pci0000:00/0000:00:11.0/i2c-0/name [42949831.710000] Modules linked in: loop dm_mod i2c_viapro i2c_core [42949831.710000] CPU: 0 [42949831.710000] EIP: 0060:[<c013763d>] Tainted: G B VLI [42949831.710000] EFLAGS: 00010246 (2.6.14-mm2) [42949831.710000] EIP is at unlock_page+0xd/0x30 [42949831.710000] eax: 00000000 ebx: c1043220 ecx: c03cad30 edx: c1652218 [42949831.710000] esi: 00000001 edi: 00000000 ebp: 00000006 esp: c26c298c [42949831.710000] ds: 007b es: 007b ss: 0068 [42949831.710000] Process mdadm (pid: 785, threadinfo=c26c2000 task=c6f64050) [42949831.710000] Stack: c1043220 c013f5e1 0000000e 00007000 f2fb87ec 00000000 00000000 00000007 [42949831.710000] 00000000 c1043220 c1045260 c1040240 c1040260 c1042820 c1042800 c10415e0 [42949831.710000] 00007000 00000000 00000000 00000000 00000006 f2fb8810 00000001 00006fff [42949831.710000] Call Trace: [42949831.710000] [<c013f5e1>] truncate_inode_pages_range+0x121/0x310 [42949831.710000] [<c01a2ac0>] reiser4_invalidate_pages+0x90/0xc0 [42949831.710000] [<c01ba5ed>] kill_hook_extent+0x17d/0x5b0 [42949831.710000] [<c01ac29c>] plugin_by_unsafe_id+0x1c/0x110 [42949831.710000] [<c01ba470>] kill_hook_extent+0x0/0x5b0 [42949831.710000] [<c01cd7fd>] call_kill_hooks+0x9d/0xc0 [42949831.710000] [<c01cd8f0>] kill_head+0x0/0x40 [42949831.710000] [<c01cdf76>] prepare_for_compact+0x536/0x540 [42949831.710000] [<c0192a0e>] lock_tail+0x1e/0x40 [42949831.710000] [<c01ac29c>] plugin_by_unsafe_id+0x1c/0x110 [42949831.710000] [<c01cd820>] kill_units+0x0/0x80 [42949831.710000] [<c01cd8f0>] kill_head+0x0/0x40 [42949831.710000] [<c0192933>] longterm_unlock_znode+0xa3/0x160 [42949831.710000] [<c0192bf3>] longterm_lock_znode+0x163/0x250 [42949831.710000] [<c018ce4b>] jload_gfp+0x5b/0x140 [42949831.710000] [<c01cdfb1>] kill_node40+0x31/0xc0 [42949831.710000] [<c0191a88>] carry_cut+0x48/0x60 [42949831.710000] [<c018f458>] carry_on_level+0x38/0xc0 [42949831.710000] [<c018f302>] carry+0x82/0x1a0 [42949831.710000] [<c018f704>] add_carry+0x24/0x40 [42949831.710000] [<c018f51d>] post_carry+0x3d/0xa0 [42949831.710000] [<c0194886>] kill_node_content+0xf6/0x160 [42949831.710000] [<c0194e39>] cut_tree_worker_common+0x159/0x350 [42949831.710000] [<c0194ce0>] cut_tree_worker_common+0x0/0x350 [42949831.710000] [<c0195155>] cut_tree_object+0x125/0x240 [42949831.710000] [<c0196d29>] reiser4_grab_reserved+0x49/0x190 [42949831.710000] [<c018d04f>] jrelse+0xf/0x20 [42949831.710000] [<c01bfc81>] cut_file_items+0xb1/0x180 [42949831.710000] [<c01a0108>] add_empty_leaf+0xa8/0x220 [42949831.710000] [<c01bfdab>] shorten_file+0x4b/0x260 [42949831.710000] [<c01bfb40>] update_file_size+0x0/0x90 [42949831.710000] [<c01c2f03>] setattr_truncate+0x73/0x210 [42949831.710000] [<c01ad384>] permission_common+0x24/0x40 [42949831.710000] [<c01ad360>] permission_common+0x0/0x40 [42949831.710000] [<c0162b78>] permission+0x48/0x90 [42949831.710000] [<c0163119>] __link_path_walk+0x89/0xc40 [42949831.710000] [<c01c30fe>] setattr_unix_file+0x5e/0xc0 [42949831.710000] [<c016f58f>] notify_change+0xcf/0x2d5 [42949831.710000] [<c0163d3f>] link_path_walk+0x6f/0xe0 [42949831.710000] [<c0153e9b>] do_truncate+0x4b/0x70 [42949831.710000] [<c0162b78>] permission+0x48/0x90 [42949831.710000] [<c0164704>] may_open+0x184/0x1d0 [42949831.710000] [<c01647d5>] open_namei+0x85/0x560 [42949831.710000] [<c0154fe2>] filp_open+0x22/0x50 [42949831.710000] [<c01551ad>] get_unused_fd+0x4d/0xb0 [42949831.710000] [<c01552c1>] do_sys_open+0x41/0xd0 [42949831.710000] [<c0102f49>] syscall_call+0x7/0xb [42949831.710000] Code: e8 69 ff ff ff 89 da b9 20 6f 13 c0 c7 04 24 02 00 00 00 e8 e6 77 22 00 83 c4 20 5b c3 90 53 89 c3 0f ba 30 00 19 c0 85 c0 75 08 <0f> 0b e0 01 f8 6a 38 c0 89 d8 e8 34 ff ff ff 89 da 31 c9 5b e9 [42949831.710000] -- Humilis IT Services and Solutions http://www.humilis.net ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: segfault mdadm --write-behind, 2.6.14-mm2 (was: Re: RAID1 ramdisk patch) 2005-11-17 7:50 ` Sander @ 2005-11-17 10:12 ` Sander 2005-11-17 10:15 ` Sander 0 siblings, 1 reply; 19+ messages in thread From: Sander @ 2005-11-17 10:12 UTC (permalink / raw) To: Sander; +Cc: Neil Brown, Andrew Morton, linux-kernel, reiserfs-dev Sander wrote (ao): # Neil Brown wrote (ao): # > On Wednesday November 16, akpm@osdl.org wrote: # > > Sander <sander@humilis.net> wrote: # > > > With 2.6.14-mm2 (x86) and mdadm 2.1 I get a Segmentation fault when I # > > > try this: # > > # > > It oopsed in reiser4. reiserfs-dev added to Cc... # > > # > # > Hmm... It appears that md/bitmap is calling prepare_write and # > commit_write with 'file' as NULL - this works for some filesystems, # > but not for reiser4. # > # > Does this patch help. # # Something changed, but it didn't fix it it seems: # # # mdadm -C /dev/md1 --bitmap=/storage/raid1.bitmap -l1 -n2 /dev/loop0 --write-behind /dev/loop1 # mdadm: RUN_ARRAY failed: No such file or directory FWIW, the following happens when I point --bitmap to /tmp/raid1.bitmap which is tmpfs, and also happens when I attach both loop0 and loop1 to files on tmpfs. This would suggest that reiser4 is not solely at fault? The difference btw is that I can reboot with 'shutdown -r now' instead of sysrq. And that mdadm hangs: # mdadm -C /dev/md1 --bitmap=/tmp/raid1.bitmap -l1 -n2 /dev/loop0 --write-behind /dev/loop1 mdadm: RUN_ARRAY failed: No such file or directory # mdadm -C /dev/md1 -f --bitmap=/tmp/raid1.bitmap -l1 -n2 /dev/loop0 --write-behind /dev/loop1 mdadm: /dev/loop0 appears to be part of a raid array: level=raid1 devices=2 ctime=Thu Nov 17 11:04:31 2005 mdadm: /dev/loop1 appears to be part of a raid array: level=raid1 devices=2 ctime=Thu Nov 17 11:04:31 2005 Continue creating array? yes [hang, no prompt, no reaction to ctrl-c, etc] [42949549.780000] md: bind<loop0> [42949549.780000] md: bind<loop1> [42949549.780000] md: md1: raid array is not clean -- starting background reconstruction [42949549.790000] md1: bitmap file is out of date (0 < 1) -- forcing full recovery [42949549.790000] md1: bitmap file is out of date, doing full recovery [42949549.790000] md1: bitmap initialized from disk: read 0/4 pages, set 0 bits, status: 524288 [42949549.790000] Bad page state at free_hot_cold_page (in process 'mdadm', page c10dcc20) [42949549.790000] flags:0x80000019 mapping:f5155c84 mapcount:0 count:0 [42949549.790000] Backtrace: [42949549.790000] [<c013b320>] bad_page+0x70/0xb0 [42949549.790000] [<c013bab1>] free_hot_cold_page+0x51/0xd0 [42949549.790000] [<c02b0a90>] bitmap_file_put+0x30/0x70 [42949549.790000] [<c02b1f8e>] bitmap_free+0x1e/0xb0 [42949549.790000] [<c02b2126>] bitmap_create+0xd6/0x2a0 [42949549.790000] [<c02ab95a>] do_md_run+0x2ba/0x500 [42949549.790000] [<c02ac8a7>] add_new_disk+0x157/0x3b0 [42949549.790000] [<c0179034>] mpage_writepages+0x124/0x3d0 [42949549.790000] [<c013c23e>] __pagevec_free+0x3e/0x60 [42949549.790000] [<c013eff9>] release_pages+0x29/0x160 [42949549.790000] [<c02adb81>] md_ioctl+0x5a1/0x630 [42949549.790000] [<c0137918>] find_get_pages+0x18/0x40 [42949549.790000] [<c02ad5e0>] md_ioctl+0x0/0x630 [42949549.790000] [<c01ede74>] blkdev_driver_ioctl+0x54/0x60 [42949549.790000] [<c01edfb4>] blkdev_ioctl+0x134/0x180 [42949549.790000] [<c015e158>] block_ioctl+0x18/0x20 [42949549.790000] [<c015e140>] block_ioctl+0x0/0x20 [42949549.790000] [<c01674ff>] do_ioctl+0x1f/0x70 [42949549.790000] [<c016769c>] vfs_ioctl+0x5c/0x1e0 [42949549.790000] [<c0156c91>] __fput+0xe1/0x140 [42949549.790000] [<c016785d>] sys_ioctl+0x3d/0x70 [42949549.790000] [<c0102f49>] syscall_call+0x7/0xb [42949549.790000] Trying to fix it up, but a reboot is needed [42949549.790000] md1: failed to create bitmap (524288) [42949549.790000] md: pers->run() failed ... [42949549.790000] md: md1 stopped. [42949549.790000] md: unbind<loop1> [42949549.790000] md: export_rdev(loop1) [42949549.790000] md: unbind<loop0> [42949549.790000] md: export_rdev(loop0) -- Humilis IT Services and Solutions http://www.humilis.net ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: segfault mdadm --write-behind, 2.6.14-mm2 (was: Re: RAID1 ramdisk patch) 2005-11-17 10:12 ` Sander @ 2005-11-17 10:15 ` Sander 2005-11-21 23:07 ` Please help me understand ->writepage. Was " Neil Brown 0 siblings, 1 reply; 19+ messages in thread From: Sander @ 2005-11-17 10:15 UTC (permalink / raw) To: Sander; +Cc: Neil Brown, Andrew Morton, linux-kernel, reiserfs-dev Sander wrote (ao): # Sander wrote (ao): # # Neil Brown wrote (ao): # # > On Wednesday November 16, akpm@osdl.org wrote: # # > > Sander <sander@humilis.net> wrote: # # > > > With 2.6.14-mm2 (x86) and mdadm 2.1 I get a Segmentation fault when I # # > > > try this: # # > > # # > > It oopsed in reiser4. reiserfs-dev added to Cc... # # > > # # > # # > Hmm... It appears that md/bitmap is calling prepare_write and # # > commit_write with 'file' as NULL - this works for some filesystems, # # > but not for reiser4. # # > # # > Does this patch help. # # # # Something changed, but it didn't fix it it seems: # # # # # mdadm -C /dev/md1 --bitmap=/storage/raid1.bitmap -l1 -n2 /dev/loop0 --write-behind /dev/loop1 # # mdadm: RUN_ARRAY failed: No such file or directory # # FWIW, the following happens when I point --bitmap to /tmp/raid1.bitmap # which is tmpfs, and also happens when I attach both loop0 and loop1 to # files on tmpfs. # # This would suggest that reiser4 is not solely at fault? # # The difference btw is that I can reboot with 'shutdown -r now' # instead of sysrq. And that mdadm hangs: # # # mdadm -C /dev/md1 --bitmap=/tmp/raid1.bitmap -l1 -n2 /dev/loop0 --write-behind /dev/loop1 # mdadm: RUN_ARRAY failed: No such file or directory # # # mdadm -C /dev/md1 -f --bitmap=/tmp/raid1.bitmap -l1 -n2 /dev/loop0 --write-behind /dev/loop1 # mdadm: /dev/loop0 appears to be part of a raid array: # level=raid1 devices=2 ctime=Thu Nov 17 11:04:31 2005 # mdadm: /dev/loop1 appears to be part of a raid array: # level=raid1 devices=2 ctime=Thu Nov 17 11:04:31 2005 # Continue creating array? yes # [hang, no prompt, no reaction to ctrl-c, etc] And even more info. It seems mdadm spins: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 749 root 25 0 1696 568 492 R 99.9 0.1 8:32.50 mdadm Would sysrq-t be useful? # [42949549.780000] md: bind<loop0> # [42949549.780000] md: bind<loop1> # [42949549.780000] md: md1: raid array is not clean -- starting background reconstruction # [42949549.790000] md1: bitmap file is out of date (0 < 1) -- forcing full recovery # [42949549.790000] md1: bitmap file is out of date, doing full recovery # [42949549.790000] md1: bitmap initialized from disk: read 0/4 pages, set 0 bits, status: 524288 # [42949549.790000] Bad page state at free_hot_cold_page (in process 'mdadm', page c10dcc20) # [42949549.790000] flags:0x80000019 mapping:f5155c84 mapcount:0 count:0 # [42949549.790000] Backtrace: # [42949549.790000] [<c013b320>] bad_page+0x70/0xb0 # [42949549.790000] [<c013bab1>] free_hot_cold_page+0x51/0xd0 # [42949549.790000] [<c02b0a90>] bitmap_file_put+0x30/0x70 # [42949549.790000] [<c02b1f8e>] bitmap_free+0x1e/0xb0 # [42949549.790000] [<c02b2126>] bitmap_create+0xd6/0x2a0 # [42949549.790000] [<c02ab95a>] do_md_run+0x2ba/0x500 # [42949549.790000] [<c02ac8a7>] add_new_disk+0x157/0x3b0 # [42949549.790000] [<c0179034>] mpage_writepages+0x124/0x3d0 # [42949549.790000] [<c013c23e>] __pagevec_free+0x3e/0x60 # [42949549.790000] [<c013eff9>] release_pages+0x29/0x160 # [42949549.790000] [<c02adb81>] md_ioctl+0x5a1/0x630 # [42949549.790000] [<c0137918>] find_get_pages+0x18/0x40 # [42949549.790000] [<c02ad5e0>] md_ioctl+0x0/0x630 # [42949549.790000] [<c01ede74>] blkdev_driver_ioctl+0x54/0x60 # [42949549.790000] [<c01edfb4>] blkdev_ioctl+0x134/0x180 # [42949549.790000] [<c015e158>] block_ioctl+0x18/0x20 # [42949549.790000] [<c015e140>] block_ioctl+0x0/0x20 # [42949549.790000] [<c01674ff>] do_ioctl+0x1f/0x70 # [42949549.790000] [<c016769c>] vfs_ioctl+0x5c/0x1e0 # [42949549.790000] [<c0156c91>] __fput+0xe1/0x140 # [42949549.790000] [<c016785d>] sys_ioctl+0x3d/0x70 # [42949549.790000] [<c0102f49>] syscall_call+0x7/0xb # [42949549.790000] Trying to fix it up, but a reboot is needed # [42949549.790000] md1: failed to create bitmap (524288) # [42949549.790000] md: pers->run() failed ... # [42949549.790000] md: md1 stopped. # [42949549.790000] md: unbind<loop1> # [42949549.790000] md: export_rdev(loop1) # [42949549.790000] md: unbind<loop0> # [42949549.790000] md: export_rdev(loop0) -- Humilis IT Services and Solutions http://www.humilis.net ^ permalink raw reply [flat|nested] 19+ messages in thread
* Please help me understand ->writepage. Was Re: segfault mdadm --write-behind, 2.6.14-mm2 (was: Re: RAID1 ramdisk patch) 2005-11-17 10:15 ` Sander @ 2005-11-21 23:07 ` Neil Brown 2005-11-21 23:30 ` Jeff Garzik 2005-11-21 23:51 ` Andrew Morton 0 siblings, 2 replies; 19+ messages in thread From: Neil Brown @ 2005-11-21 23:07 UTC (permalink / raw) To: sander; +Cc: Andrew Morton, linux-kernel, reiserfs-dev On Thursday November 17, sander@humilis.net wrote: > Sander wrote (ao): > # Sander wrote (ao): > # # Neil Brown wrote (ao): > # # > On Wednesday November 16, akpm@osdl.org wrote: > # # > > Sander <sander@humilis.net> wrote: > # # > > > With 2.6.14-mm2 (x86) and mdadm 2.1 I get a Segmentation fault when I > # # > > > try this: > # # > > > # # > > It oopsed in reiser4. reiserfs-dev added to Cc... > # # > > > # # > > # # > Hmm... It appears that md/bitmap is calling prepare_write and > # # > commit_write with 'file' as NULL - this works for some filesystems, > # # > but not for reiser4. > # # > > # # > Does this patch help. > # # > # # Something changed, but it didn't fix it it seems: > # # > # # # mdadm -C /dev/md1 --bitmap=/storage/raid1.bitmap -l1 -n2 /dev/loop0 --write-behind /dev/loop1 > # # mdadm: RUN_ARRAY failed: No such file or directory > # > # FWIW, the following happens when I point --bitmap to /tmp/raid1.bitmap > # which is tmpfs, and also happens when I attach both loop0 and loop1 to > # files on tmpfs. > # > # This would suggest that reiser4 is not solely at fault? > # No, there is something very wrong in md/bitmap.c's handling of writing to a file. It was developed for, and tested on, ext3 and doesn't seem to work anywhere else.... and I don't understand enough to fix it. Help ??? What md/bitmap wants to do is effectively memory map the file, make updates to pages occasionally, flush those pages out to storage, and wait for the flush to complete. It doesn't exactly memory map. It just reads all the pages and keeps them in an array (holding a reference to each). To write the pages out it effectively does ->prepare_write, ->commit_write, and then ->writepage. I'm not sure that prepare/commit is needed, but they don't seem to be the problem. writepage is. For tmpfs at least, writepage disconnects the page from the pagecache (via move_to_swap_cache), so the page that we are holding is no longer part of the file and, significantly, page->mapping become NULL. This suggests that the ->writepage usage is broken. However I tried to see what 'msync' does for real memory mapped files, and it eventually calls ->writepage too. So how does that work?? Any advice would be most welcome! Thanks, NeilBrown ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Please help me understand ->writepage. Was Re: segfault mdadm --write-behind, 2.6.14-mm2 (was: Re: RAID1 ramdisk patch) 2005-11-21 23:07 ` Please help me understand ->writepage. Was " Neil Brown @ 2005-11-21 23:30 ` Jeff Garzik 2005-11-21 23:51 ` Andrew Morton 1 sibling, 0 replies; 19+ messages in thread From: Jeff Garzik @ 2005-11-21 23:30 UTC (permalink / raw) To: Neil Brown; +Cc: sander, Andrew Morton, linux-kernel, reiserfs-dev On Tue, Nov 22, 2005 at 10:07:41AM +1100, Neil Brown wrote: > To write the pages out it effectively does ->prepare_write, > ->commit_write, and then ->writepage. > I'm not sure that prepare/commit is needed, but they don't seem to be > the problem. writepage is. That's a bit weird. Typically you have two separate callpaths, non-page-aligned (prepare_write + commit_write) or writepage(s). Not both. Jeff ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Please help me understand ->writepage. Was Re: segfault mdadm --write-behind, 2.6.14-mm2 (was: Re: RAID1 ramdisk patch) 2005-11-21 23:07 ` Please help me understand ->writepage. Was " Neil Brown 2005-11-21 23:30 ` Jeff Garzik @ 2005-11-21 23:51 ` Andrew Morton 2005-11-22 3:12 ` Neil Brown 1 sibling, 1 reply; 19+ messages in thread From: Andrew Morton @ 2005-11-21 23:51 UTC (permalink / raw) To: Neil Brown; +Cc: sander, linux-kernel, reiserfs-dev Neil Brown <neilb@suse.de> wrote: > > Help ??? Indeed. tmpfs is crackpottery. > What md/bitmap wants to do is effectively memory map the file, make > updates to pages occasionally, flush those pages out to storage, and > wait for the flush to complete. It doesn't exactly memory map. It > just reads all the pages and keeps them in an array (holding a > reference to each). > > To write the pages out it effectively does ->prepare_write, > ->commit_write, and then ->writepage. > I'm not sure that prepare/commit is needed, but they don't seem to be > the problem. writepage is. > > For tmpfs at least, writepage disconnects the page from the pagecache > (via move_to_swap_cache), so the page that we are holding is no longer > part of the file and, significantly, page->mapping become NULL. > This suggests that the ->writepage usage is broken. > However I tried to see what 'msync' does for real memory mapped files, > and it eventually calls ->writepage too. So how does that work?? > > Any advice would be most welcome! Skip the writepage if !mapping_cap_writeback_dirty(page->mapping), I guess. Or, if appropriate, just sync the file. Use filemap_fdatawrite() or even refactor do_fsync() and use most of that. Also, write_page() doesn't need to run set_page_dirty(); ->commit_write() will do that. Several kmap()s in there which can become kmap_atomic(). bitmap_init_from_disk() might be leaking bitmap->filemap on kmalloc-failed error path. bitmap->filemap_attr can be allocated with kzalloc() now. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Please help me understand ->writepage. Was Re: segfault mdadm --write-behind, 2.6.14-mm2 (was: Re: RAID1 ramdisk patch) 2005-11-21 23:51 ` Andrew Morton @ 2005-11-22 3:12 ` Neil Brown 2005-11-22 3:47 ` Andrew Morton ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Neil Brown @ 2005-11-22 3:12 UTC (permalink / raw) To: Andrew Morton; +Cc: sander, linux-kernel, reiserfs-dev On Monday November 21, akpm@osdl.org wrote: > Neil Brown <neilb@suse.de> wrote: > > > > Help ??? > > Indeed. tmpfs is crackpottery. Ok, that explains a lot... :-) > > > > Any advice would be most welcome! > > Skip the writepage if !mapping_cap_writeback_dirty(page->mapping), I guess. > Or, if appropriate, just sync the file. Use filemap_fdatawrite() or even > refactor do_fsync() and use most of that. Uhm, what would you think of testing mapping_cap_writeback_dirty in write_one_page?? If you don't like it, I can take it into write_page. > > Also, write_page() doesn't need to run set_page_dirty(); ->commit_write() > will do that. Ok.... but I think I'm dropping prepare_write / commit_write. > > Several kmap()s in there which can become kmap_atomic(). I've made them all kmap_atomic. > > bitmap_init_from_disk() might be leaking bitmap->filemap on kmalloc-failed > error path. It looks that way, but actually not. bitmap_create requires that bitmap_destroy always be called afterwards, even on an error. Not the best interface I'd agree... > > bitmap->filemap_attr can be allocated with kzalloc() now. Yes, thanks. So Sander, could you try this patch for main against reiser4? It seems to work on ext3 and tmpfs and has some chance of not mucking up on reiser4. Thanks, NeilBrown ===File /home/src/mm/.patches/applied/014MdBitmapFix======== Status: devel Hopefully make md/bitmaps work on files other than ext3 Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./drivers/md/bitmap.c | 64 +++++++++++++++++++------------------------------- ./mm/page-writeback.c | 4 +++ 2 files changed, 29 insertions(+), 39 deletions(-) diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c --- ./drivers/md/bitmap.c~current~ 2005-11-22 14:06:53.000000000 +1100 +++ ./drivers/md/bitmap.c 2005-11-22 14:07:05.000000000 +1100 @@ -310,7 +310,6 @@ static int write_sb_page(mddev_t *mddev, */ static int write_page(struct bitmap *bitmap, struct page *page, int wait) { - int ret = -ENOMEM; if (bitmap->file == NULL) return write_sb_page(bitmap->mddev, bitmap->offset, page, wait); @@ -326,15 +325,6 @@ static int write_page(struct bitmap *bit } } - ret = page->mapping->a_ops->prepare_write(bitmap->file, page, 0, PAGE_SIZE); - if (!ret) - ret = page->mapping->a_ops->commit_write(bitmap->file, page, 0, - PAGE_SIZE); - if (ret) { - unlock_page(page); - return ret; - } - set_page_dirty(page); /* force it to be written out */ if (!wait) { @@ -406,11 +396,11 @@ int bitmap_update_sb(struct bitmap *bitm return 0; } spin_unlock_irqrestore(&bitmap->lock, flags); - sb = (bitmap_super_t *)kmap(bitmap->sb_page); + sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0); sb->events = cpu_to_le64(bitmap->mddev->events); if (!bitmap->mddev->degraded) sb->events_cleared = cpu_to_le64(bitmap->mddev->events); - kunmap(bitmap->sb_page); + kunmap_atomic(bitmap->sb_page, KM_USER0); return write_page(bitmap, bitmap->sb_page, 1); } @@ -421,7 +411,7 @@ void bitmap_print_sb(struct bitmap *bitm if (!bitmap || !bitmap->sb_page) return; - sb = (bitmap_super_t *)kmap(bitmap->sb_page); + sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0); printk(KERN_DEBUG "%s: bitmap file superblock:\n", bmname(bitmap)); printk(KERN_DEBUG " magic: %08x\n", le32_to_cpu(sb->magic)); printk(KERN_DEBUG " version: %d\n", le32_to_cpu(sb->version)); @@ -440,7 +430,7 @@ void bitmap_print_sb(struct bitmap *bitm printk(KERN_DEBUG " sync size: %llu KB\n", (unsigned long long)le64_to_cpu(sb->sync_size)/2); printk(KERN_DEBUG "max write behind: %d\n", le32_to_cpu(sb->write_behind)); - kunmap(bitmap->sb_page); + kunmap_atomic(bitmap->sb_page, KM_USER0); } /* read the superblock from the bitmap file and initialize some bitmap fields */ @@ -466,7 +456,7 @@ static int bitmap_read_sb(struct bitmap return err; } - sb = (bitmap_super_t *)kmap(bitmap->sb_page); + sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0); if (bytes_read < sizeof(*sb)) { /* short read */ printk(KERN_INFO "%s: bitmap file superblock truncated\n", @@ -535,7 +525,7 @@ success: bitmap->events_cleared = bitmap->mddev->events; err = 0; out: - kunmap(bitmap->sb_page); + kunmap_atomic(bitmap->sb_page, KM_USER0); if (err) bitmap_print_sb(bitmap); return err; @@ -560,7 +550,7 @@ static void bitmap_mask_state(struct bit } page_cache_get(bitmap->sb_page); spin_unlock_irqrestore(&bitmap->lock, flags); - sb = (bitmap_super_t *)kmap(bitmap->sb_page); + sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0); switch (op) { case MASK_SET: sb->state |= bits; break; @@ -568,7 +558,7 @@ static void bitmap_mask_state(struct bit break; default: BUG(); } - kunmap(bitmap->sb_page); + kunmap_atomic(bitmap->sb_page, KM_USER0); page_cache_release(bitmap->sb_page); } @@ -621,8 +611,7 @@ static void bitmap_file_unmap(struct bit spin_unlock_irqrestore(&bitmap->lock, flags); while (pages--) - if (map[pages]->index != 0) /* 0 is sb_page, release it below */ - page_cache_release(map[pages]); + page_cache_release(map[pages]); kfree(map); kfree(attr); @@ -771,7 +760,7 @@ static void bitmap_file_set_bit(struct b set_bit(bit, kaddr); else ext2_set_bit(bit, kaddr); - kunmap_atomic(kaddr, KM_USER0); + kunmap_atomic(page, KM_USER0); PRINTK("set file bit %lu page %lu\n", bit, page->index); /* record page number so it gets flushed to disk when unplug occurs */ @@ -854,6 +843,7 @@ static int bitmap_init_from_disk(struct unsigned long bytes, offset, dummy; int outofdate; int ret = -ENOSPC; + void *paddr; chunks = bitmap->chunks; file = bitmap->file; @@ -887,12 +877,10 @@ static int bitmap_init_from_disk(struct if (!bitmap->filemap) goto out; - bitmap->filemap_attr = kmalloc(sizeof(long) * num_pages, GFP_KERNEL); + bitmap->filemap_attr = kzalloc(sizeof(long) * num_pages, GFP_KERNEL); if (!bitmap->filemap_attr) goto out; - memset(bitmap->filemap_attr, 0, sizeof(long) * num_pages); - oldindex = ~0L; for (i = 0; i < chunks; i++) { @@ -901,8 +889,6 @@ static int bitmap_init_from_disk(struct bit = file_page_offset(i); if (index != oldindex) { /* this is a new page, read it in */ /* unmap the old page, we're done with it */ - if (oldpage != NULL) - kunmap(oldpage); if (index == 0) { /* * if we're here then the superblock page @@ -910,6 +896,7 @@ static int bitmap_init_from_disk(struct * we've already read it in, so just use it */ page = bitmap->sb_page; + page_cache_get(page); offset = sizeof(bitmap_super_t); } else if (file) { page = read_page(file, index, &dummy); @@ -925,18 +912,18 @@ static int bitmap_init_from_disk(struct oldindex = index; oldpage = page; - kmap(page); if (outofdate) { /* * if bitmap is out of date, dirty the * whole page and write it out */ - memset(page_address(page) + offset, 0xff, + paddr = kmap_atomic(page, KM_USER0); + memset(paddr + offset, 0xff, PAGE_SIZE - offset); + kunmap_atomic(page, KM_USER0); ret = write_page(bitmap, page, 1); if (ret) { - kunmap(page); /* release, page not in filemap yet */ page_cache_release(page); goto out; @@ -945,10 +932,12 @@ static int bitmap_init_from_disk(struct bitmap->filemap[bitmap->file_pages++] = page; } + paddr = kmap_atomic(page, KM_USER0); if (bitmap->flags & BITMAP_HOSTENDIAN) - b = test_bit(bit, page_address(page)); + b = test_bit(bit, paddr); else - b = ext2_test_bit(bit, page_address(page)); + b = ext2_test_bit(bit, paddr); + kunmap_atomic(page, KM_USER0); if (b) { /* if the disk bit is set, set the memory bit */ bitmap_set_memory_bits(bitmap, i << CHUNK_BLOCK_SHIFT(bitmap), @@ -963,9 +952,6 @@ static int bitmap_init_from_disk(struct ret = 0; bitmap_mask_state(bitmap, BITMAP_STALE, MASK_UNSET); - if (page) /* unmap the last page */ - kunmap(page); - if (bit_cnt) { /* Kick recovery if any bits were set */ set_bit(MD_RECOVERY_NEEDED, &bitmap->mddev->recovery); md_wakeup_thread(bitmap->mddev->thread); @@ -1021,6 +1007,7 @@ int bitmap_daemon_work(struct bitmap *bi int err = 0; int blocks; int attr; + void *paddr; if (bitmap == NULL) return 0; @@ -1077,14 +1064,12 @@ int bitmap_daemon_work(struct bitmap *bi set_page_attr(bitmap, lastpage, BITMAP_PAGE_NEEDWRITE); spin_unlock_irqrestore(&bitmap->lock, flags); } - kunmap(lastpage); page_cache_release(lastpage); if (err) bitmap_file_kick(bitmap); } else spin_unlock_irqrestore(&bitmap->lock, flags); lastpage = page; - kmap(page); /* printk("bitmap clean at page %lu\n", j); */ @@ -1107,10 +1092,12 @@ int bitmap_daemon_work(struct bitmap *bi -1); /* clear the bit */ + paddr = kmap_atomic(page, KM_USER0); if (bitmap->flags & BITMAP_HOSTENDIAN) - clear_bit(file_page_offset(j), page_address(page)); + clear_bit(file_page_offset(j), paddr); else - ext2_clear_bit(file_page_offset(j), page_address(page)); + ext2_clear_bit(file_page_offset(j), paddr); + kunmap_atomic(page, KM_USER0); } } spin_unlock_irqrestore(&bitmap->lock, flags); @@ -1118,7 +1105,6 @@ int bitmap_daemon_work(struct bitmap *bi /* now sync the final page */ if (lastpage != NULL) { - kunmap(lastpage); spin_lock_irqsave(&bitmap->lock, flags); if (get_page_attr(bitmap, lastpage) &BITMAP_PAGE_NEEDWRITE) { clear_page_attr(bitmap, lastpage, BITMAP_PAGE_NEEDWRITE); diff ./mm/page-writeback.c~current~ ./mm/page-writeback.c --- ./mm/page-writeback.c~current~ 2005-11-22 14:06:53.000000000 +1100 +++ ./mm/page-writeback.c 2005-11-22 14:07:05.000000000 +1100 @@ -583,6 +583,10 @@ int write_one_page(struct page *page, in }; BUG_ON(!PageLocked(page)); + if (!mapping_cap_writeback_dirty(mapping)) { + unlock_page(page); + return ret; + } if (wait) wait_on_page_writeback(page); ============================================================ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Please help me understand ->writepage. Was Re: segfault mdadm --write-behind, 2.6.14-mm2 (was: Re: RAID1 ramdisk patch) 2005-11-22 3:12 ` Neil Brown @ 2005-11-22 3:47 ` Andrew Morton 2005-11-22 10:34 ` Sander 2005-11-22 12:00 ` Please help me understand ->writepage. " Anton Altaparmakov 2 siblings, 0 replies; 19+ messages in thread From: Andrew Morton @ 2005-11-22 3:47 UTC (permalink / raw) To: Neil Brown; +Cc: sander, linux-kernel, reiserfs-dev Neil Brown <neilb@suse.de> wrote: > > Uhm, what would you think of testing mapping_cap_writeback_dirty in > write_one_page?? If you don't like it, I can take it into write_page. write_one_page() is a little library function for filesystems to call, and filesystems implicitly know whether or not they have backing store. So probably it's best to do this test in the (unusual) caller. > > Also, write_page() doesn't need to run set_page_dirty(); ->commit_write() > > will do that. > > Ok.... but I think I'm dropping prepare_write / commit_write. > Those functions do some pretty handy things, like creating disk blocks within the file to back the page. If someone comes along and ftruncate()s the bitmap file while you're not looking, what happens? Generally we use i_sem for this sort of thing. If you know that the page is still mapped into the file then yes, you can do lock_page() kmap_atomic() <modify> kunmap_atomic() flush_dcache_page() set_page_dirty() unlock_page() write_one_page(wait==1) but that's rather a lot of work. bitmap_unplug() looks risky - calling filesystem functions (like lock_page()) from inside an unplug function. Can this all be called from the vmscan->writepage path? It might be simpler and more maintainable to maintain the bitmap in normal kernel memory, sync it to disk via higher-level entrypoints like sys_write(), vfs_write(), sys_sync(), do_sync(), etc. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Please help me understand ->writepage. Was Re: segfault mdadm --write-behind, 2.6.14-mm2 (was: Re: RAID1 ramdisk patch) 2005-11-22 3:12 ` Neil Brown 2005-11-22 3:47 ` Andrew Morton @ 2005-11-22 10:34 ` Sander 2005-11-24 5:41 ` Please help me understand reiser4_writepage. " Neil Brown 2005-11-22 12:00 ` Please help me understand ->writepage. " Anton Altaparmakov 2 siblings, 1 reply; 19+ messages in thread From: Sander @ 2005-11-22 10:34 UTC (permalink / raw) To: Neil Brown; +Cc: Andrew Morton, sander, linux-kernel, reiserfs-dev Neil Brown wrote (ao): > On Monday November 21, akpm@osdl.org wrote: > > bitmap->filemap_attr can be allocated with kzalloc() now. > Yes, thanks. > > So Sander, could you try this patch for main against reiser4? It > seems to work on ext3 and tmpfs and has some chance of not mucking up > on reiser4. It doesn't crash or segfault anymore. It works with the bitmap file on tmpfs, but not yet on reiser4. This is kernel 2.6.15-rc1-mm2 with your (Neil Brown's) patch. loop0 is connected to a file on tmpfs loop1 to a file on reiser4 /storage/raid1.bitmap is also on reiser4 # mdadm -C /dev/md1 --bitmap=/storage/raid1.bitmap -l1 -n2 /dev/loop0 --write-behind /dev/loop1 mdadm: RUN_ARRAY failed: No such file or directory # cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6] [raid10] md0 : active raid1 sdd1[3] sdc1[2] sdb1[1] sda1[0] 1003904 blocks [4/4] [UUUU] unused devices: <none> # mdadm -C /dev/md1 --bitmap=/storage/raid1.bitmap -l1 -n2 /dev/loop0 --write-behind /dev/loop1 mdadm: /dev/loop0 appears to be part of a raid array: level=raid1 devices=2 ctime=Tue Nov 22 11:09:15 2005 mdadm: /dev/loop1 appears to be part of a raid array: level=raid1 devices=2 ctime=Tue Nov 22 11:09:15 2005 Continue creating array? yes mdadm: bitmap file /storage/raid1.bitmap already exists, use --force to overwrite # mdadm -C /dev/md1 -f --bitmap=/storage/raid1.bitmap -l1 -n2 /dev/loop0 --write-behind /dev/loop1 mdadm: /dev/loop0 appears to be part of a raid array: level=raid1 devices=2 ctime=Tue Nov 22 11:09:15 2005 mdadm: /dev/loop1 appears to be part of a raid array: level=raid1 devices=2 ctime=Tue Nov 22 11:09:15 2005 Continue creating array? yes mdadm: RUN_ARRAY failed: Success # cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6] [raid10] md0 : active raid1 sdd1[3] sdc1[2] sdb1[1] sda1[0] 1003904 blocks [4/4] [UUUU] unused devices: <none> dmesg: [42949583.660000] loop: loaded (max 8 devices) [42949655.110000] md: bind<loop0> [42949655.110000] md: bind<loop1> [42949655.110000] md: md1: raid array is not clean -- starting background reconstruction [42949655.110000] md1: bitmap file is out of date (0 < 1) -- forcing full recovery [42949655.110000] md1: bitmap file is out of date, doing full recovery [42949655.680000] md1: bitmap initialized from disk: read 0/4 pages, set 0 bits, status: 1 [42949655.680000] md1: failed to create bitmap (1) [42949655.680000] md: pers->run() failed ... [42949655.680000] md: md1 stopped. [42949655.680000] md: unbind<loop1> [42949655.680000] md: export_rdev(loop1) [42949655.680000] md: unbind<loop0> [42949655.680000] md: export_rdev(loop0) [42949671.480000] md: bind<loop0> [42949671.480000] md: bind<loop1> [42949671.480000] md: md1: raid array is not clean -- starting background reconstruction [42949671.480000] md1: bitmap file is out of date (0 < 1) -- forcing full recovery [42949671.480000] md1: bitmap file is out of date, doing full recovery [42949671.770000] md1: bitmap initialized from disk: read 0/4 pages, set 0 bits, status: 1 [42949671.770000] md1: failed to create bitmap (1) [42949671.770000] md: pers->run() failed ... [42949671.770000] md: md1 stopped. [42949671.770000] md: unbind<loop1> [42949671.770000] md: export_rdev(loop1) [42949671.770000] md: unbind<loop0> [42949671.770000] md: export_rdev(loop0) It does work with the bitmap file on tmpfs: # mdadm -C /dev/md1 -f --bitmap=/tmp/raid1.bitmap -l1 -n2 /dev/loop0 --write-behind /dev/loop1 mdadm: /dev/loop0 appears to be part of a raid array: level=raid1 devices=2 ctime=Tue Nov 22 11:20:48 2005 mdadm: /dev/loop1 appears to be part of a raid array: level=raid1 devices=2 ctime=Tue Nov 22 11:20:48 2005 Continue creating array? yes mdadm: array /dev/md1 started. silo1:~# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6] [raid10] md1 : active raid1 loop1[1] loop0[0] 509056 blocks [2/2] [UU] [>....................] resync = 1.2% (6528/509056) finish=2.5min speed=3264K/sec bitmap: 63/63 pages [252KB], 4KB chunk, file: /tmp/raid1.bitmap md0 : active raid1 sdd1[3] sdc1[2] sdb1[1] sda1[0] 1003904 blocks [4/4] [UUUU] unused devices: <none> Is there anything you need me to test further? Thanks for the patch! Sander -- Humilis IT Services and Solutions http://www.humilis.net ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Please help me understand reiser4_writepage. Was Re: segfault mdadm --write-behind, 2.6.14-mm2 (was: Re: RAID1 ramdisk patch) 2005-11-22 10:34 ` Sander @ 2005-11-24 5:41 ` Neil Brown 0 siblings, 0 replies; 19+ messages in thread From: Neil Brown @ 2005-11-24 5:41 UTC (permalink / raw) To: sander; +Cc: Andrew Morton, linux-kernel, reiserfs-dev On Tuesday November 22, sander@humilis.net wrote: > > It doesn't crash or segfault anymore. It works with the bitmap file on > tmpfs, but not yet on reiser4. > > This is kernel 2.6.15-rc1-mm2 with your (Neil Brown's) patch. > ... > [42949655.680000] md1: bitmap initialized from disk: read 0/4 pages, set 0 bits, status: 1 .... Ok, this is interesting... 'status: 1'. That should be either 0 or a negative errno. That is printed in bitmap_init_from_disk in drivers/md/bitmap.c 'ret' can only be '1' if that value is returned from 'write_page' write_page (same file) can only return '1' if that is returned by write_one_page (mm/page-writeback.c). write_one_page can only return '1' from a_ops->writepage, which is presumably reiser4_writepage in fs/reiser4/page_cache.c This will only return an unchecked value from write_page_by_ent (if REISER4_USE_ENTD is defined) or emergency_flush. emergency_flush is in fs/reiser4/emergency_flush.c and it does indeed return 1 in some circumstances, though I don't really know what circumstances. So there may well be something that md/bitmap is doing wrongly, but reiser4_writepage should not be returning 1 in any case. Could someone on reiserfs-dev help me understand when reiser4_writepage returns '1' and what I might be doing to trigger that? Thanks, NeilBrown ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Please help me understand ->writepage. Was Re: segfault mdadm --write-behind, 2.6.14-mm2 (was: Re: RAID1 ramdisk patch) 2005-11-22 3:12 ` Neil Brown 2005-11-22 3:47 ` Andrew Morton 2005-11-22 10:34 ` Sander @ 2005-11-22 12:00 ` Anton Altaparmakov 2005-11-24 5:29 ` Neil Brown 2 siblings, 1 reply; 19+ messages in thread From: Anton Altaparmakov @ 2005-11-22 12:00 UTC (permalink / raw) To: Neil Brown; +Cc: Andrew Morton, sander, linux-kernel, reiserfs-dev On Tue, 22 Nov 2005, Neil Brown wrote: > On Monday November 21, akpm@osdl.org wrote: > > Neil Brown <neilb@suse.de> wrote: > > > > > > Help ??? > > > > Indeed. tmpfs is crackpottery. > > Ok, that explains a lot... :-) > > > > Any advice would be most welcome! > > > > Skip the writepage if !mapping_cap_writeback_dirty(page->mapping), I guess. > > Or, if appropriate, just sync the file. Use filemap_fdatawrite() or even > > refactor do_fsync() and use most of that. > > Uhm, what would you think of testing mapping_cap_writeback_dirty in > write_one_page?? If you don't like it, I can take it into write_page. > > > Also, write_page() doesn't need to run set_page_dirty(); ->commit_write() > > will do that. > > Ok.... but I think I'm dropping prepare_write / commit_write. That is a good idea given some file systems do not implement them. > > Several kmap()s in there which can become kmap_atomic(). > > I've made them all kmap_atomic. Except you did it wrong... See below... > > bitmap_init_from_disk() might be leaking bitmap->filemap on kmalloc-failed > > error path. > > It looks that way, but actually not. bitmap_create requires that > bitmap_destroy always be called afterwards, even on an error. Not the > best interface I'd agree... > > > bitmap->filemap_attr can be allocated with kzalloc() now. > Yes, thanks. > > So Sander, could you try this patch for main against reiser4? It > seems to work on ext3 and tmpfs and has some chance of not mucking up > on reiser4. > > Thanks, > NeilBrown > > ===File /home/src/mm/.patches/applied/014MdBitmapFix======== > Status: devel > > Hopefully make md/bitmaps work on files other than ext3 > > > > Signed-off-by: Neil Brown <neilb@suse.de> > > ### Diffstat output > ./drivers/md/bitmap.c | 64 +++++++++++++++++++------------------------------- > ./mm/page-writeback.c | 4 +++ > 2 files changed, 29 insertions(+), 39 deletions(-) > > diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c > --- ./drivers/md/bitmap.c~current~ 2005-11-22 14:06:53.000000000 +1100 > +++ ./drivers/md/bitmap.c 2005-11-22 14:07:05.000000000 +1100 > @@ -310,7 +310,6 @@ static int write_sb_page(mddev_t *mddev, > */ > static int write_page(struct bitmap *bitmap, struct page *page, int wait) > { > - int ret = -ENOMEM; > > if (bitmap->file == NULL) > return write_sb_page(bitmap->mddev, bitmap->offset, page, wait); > @@ -326,15 +325,6 @@ static int write_page(struct bitmap *bit > } > } > > - ret = page->mapping->a_ops->prepare_write(bitmap->file, page, 0, PAGE_SIZE); > - if (!ret) > - ret = page->mapping->a_ops->commit_write(bitmap->file, page, 0, > - PAGE_SIZE); > - if (ret) { > - unlock_page(page); > - return ret; > - } > - > set_page_dirty(page); /* force it to be written out */ > > if (!wait) { > @@ -406,11 +396,11 @@ int bitmap_update_sb(struct bitmap *bitm > return 0; > } > spin_unlock_irqrestore(&bitmap->lock, flags); > - sb = (bitmap_super_t *)kmap(bitmap->sb_page); > + sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0); > sb->events = cpu_to_le64(bitmap->mddev->events); > if (!bitmap->mddev->degraded) > sb->events_cleared = cpu_to_le64(bitmap->mddev->events); > - kunmap(bitmap->sb_page); > + kunmap_atomic(bitmap->sb_page, KM_USER0); You need to pass in the address not the page, i.e.: kunmap_atomic(sb, KM_USER0); > return write_page(bitmap, bitmap->sb_page, 1); > } > > @@ -421,7 +411,7 @@ void bitmap_print_sb(struct bitmap *bitm > > if (!bitmap || !bitmap->sb_page) > return; > - sb = (bitmap_super_t *)kmap(bitmap->sb_page); > + sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0); > printk(KERN_DEBUG "%s: bitmap file superblock:\n", bmname(bitmap)); > printk(KERN_DEBUG " magic: %08x\n", le32_to_cpu(sb->magic)); > printk(KERN_DEBUG " version: %d\n", le32_to_cpu(sb->version)); > @@ -440,7 +430,7 @@ void bitmap_print_sb(struct bitmap *bitm > printk(KERN_DEBUG " sync size: %llu KB\n", > (unsigned long long)le64_to_cpu(sb->sync_size)/2); > printk(KERN_DEBUG "max write behind: %d\n", le32_to_cpu(sb->write_behind)); > - kunmap(bitmap->sb_page); > + kunmap_atomic(bitmap->sb_page, KM_USER0); Again, this should be: kunmap_atomic(sb, KM_USER0); > } > > /* read the superblock from the bitmap file and initialize some bitmap fields */ > @@ -466,7 +456,7 @@ static int bitmap_read_sb(struct bitmap > return err; > } > > - sb = (bitmap_super_t *)kmap(bitmap->sb_page); > + sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0); > > if (bytes_read < sizeof(*sb)) { /* short read */ > printk(KERN_INFO "%s: bitmap file superblock truncated\n", > @@ -535,7 +525,7 @@ success: > bitmap->events_cleared = bitmap->mddev->events; > err = 0; > out: > - kunmap(bitmap->sb_page); > + kunmap_atomic(bitmap->sb_page, KM_USER0); Again: kunmap_atomic(sb, KM_USER0); > if (err) > bitmap_print_sb(bitmap); > return err; > @@ -560,7 +550,7 @@ static void bitmap_mask_state(struct bit > } > page_cache_get(bitmap->sb_page); > spin_unlock_irqrestore(&bitmap->lock, flags); > - sb = (bitmap_super_t *)kmap(bitmap->sb_page); > + sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0); > switch (op) { > case MASK_SET: sb->state |= bits; > break; > @@ -568,7 +558,7 @@ static void bitmap_mask_state(struct bit > break; > default: BUG(); > } > - kunmap(bitmap->sb_page); > + kunmap_atomic(bitmap->sb_page, KM_USER0); Again: kunmap_atomic(sb, KM_USER0); > page_cache_release(bitmap->sb_page); > } > > @@ -621,8 +611,7 @@ static void bitmap_file_unmap(struct bit > spin_unlock_irqrestore(&bitmap->lock, flags); > > while (pages--) > - if (map[pages]->index != 0) /* 0 is sb_page, release it below */ > - page_cache_release(map[pages]); > + page_cache_release(map[pages]); > kfree(map); > kfree(attr); > > @@ -771,7 +760,7 @@ static void bitmap_file_set_bit(struct b > set_bit(bit, kaddr); > else > ext2_set_bit(bit, kaddr); > - kunmap_atomic(kaddr, KM_USER0); > + kunmap_atomic(page, KM_USER0); This one was correct, you broke it. (-: > PRINTK("set file bit %lu page %lu\n", bit, page->index); > > /* record page number so it gets flushed to disk when unplug occurs */ > @@ -854,6 +843,7 @@ static int bitmap_init_from_disk(struct > unsigned long bytes, offset, dummy; > int outofdate; > int ret = -ENOSPC; > + void *paddr; > > chunks = bitmap->chunks; > file = bitmap->file; > @@ -887,12 +877,10 @@ static int bitmap_init_from_disk(struct > if (!bitmap->filemap) > goto out; > > - bitmap->filemap_attr = kmalloc(sizeof(long) * num_pages, GFP_KERNEL); > + bitmap->filemap_attr = kzalloc(sizeof(long) * num_pages, GFP_KERNEL); > if (!bitmap->filemap_attr) > goto out; > > - memset(bitmap->filemap_attr, 0, sizeof(long) * num_pages); > - > oldindex = ~0L; > > for (i = 0; i < chunks; i++) { > @@ -901,8 +889,6 @@ static int bitmap_init_from_disk(struct > bit = file_page_offset(i); > if (index != oldindex) { /* this is a new page, read it in */ > /* unmap the old page, we're done with it */ > - if (oldpage != NULL) > - kunmap(oldpage); > if (index == 0) { > /* > * if we're here then the superblock page > @@ -910,6 +896,7 @@ static int bitmap_init_from_disk(struct > * we've already read it in, so just use it > */ > page = bitmap->sb_page; > + page_cache_get(page); > offset = sizeof(bitmap_super_t); > } else if (file) { > page = read_page(file, index, &dummy); > @@ -925,18 +912,18 @@ static int bitmap_init_from_disk(struct > > oldindex = index; > oldpage = page; > - kmap(page); > > if (outofdate) { > /* > * if bitmap is out of date, dirty the > * whole page and write it out > */ > - memset(page_address(page) + offset, 0xff, > + paddr = kmap_atomic(page, KM_USER0); > + memset(paddr + offset, 0xff, > PAGE_SIZE - offset); > + kunmap_atomic(page, KM_USER0); Again: kunmap_atomic(paddr, KM_USER0); > ret = write_page(bitmap, page, 1); > if (ret) { > - kunmap(page); > /* release, page not in filemap yet */ > page_cache_release(page); > goto out; > @@ -945,10 +932,12 @@ static int bitmap_init_from_disk(struct > > bitmap->filemap[bitmap->file_pages++] = page; > } > + paddr = kmap_atomic(page, KM_USER0); > if (bitmap->flags & BITMAP_HOSTENDIAN) > - b = test_bit(bit, page_address(page)); > + b = test_bit(bit, paddr); > else > - b = ext2_test_bit(bit, page_address(page)); > + b = ext2_test_bit(bit, paddr); > + kunmap_atomic(page, KM_USER0); Again: kunmap_atomic(paddr, KM_USER0); > if (b) { > /* if the disk bit is set, set the memory bit */ > bitmap_set_memory_bits(bitmap, i << CHUNK_BLOCK_SHIFT(bitmap), > @@ -963,9 +952,6 @@ static int bitmap_init_from_disk(struct > ret = 0; > bitmap_mask_state(bitmap, BITMAP_STALE, MASK_UNSET); > > - if (page) /* unmap the last page */ > - kunmap(page); > - > if (bit_cnt) { /* Kick recovery if any bits were set */ > set_bit(MD_RECOVERY_NEEDED, &bitmap->mddev->recovery); > md_wakeup_thread(bitmap->mddev->thread); > @@ -1021,6 +1007,7 @@ int bitmap_daemon_work(struct bitmap *bi > int err = 0; > int blocks; > int attr; > + void *paddr; > > if (bitmap == NULL) > return 0; > @@ -1077,14 +1064,12 @@ int bitmap_daemon_work(struct bitmap *bi > set_page_attr(bitmap, lastpage, BITMAP_PAGE_NEEDWRITE); > spin_unlock_irqrestore(&bitmap->lock, flags); > } > - kunmap(lastpage); > page_cache_release(lastpage); > if (err) > bitmap_file_kick(bitmap); > } else > spin_unlock_irqrestore(&bitmap->lock, flags); > lastpage = page; > - kmap(page); > /* > printk("bitmap clean at page %lu\n", j); > */ > @@ -1107,10 +1092,12 @@ int bitmap_daemon_work(struct bitmap *bi > -1); > > /* clear the bit */ > + paddr = kmap_atomic(page, KM_USER0); > if (bitmap->flags & BITMAP_HOSTENDIAN) > - clear_bit(file_page_offset(j), page_address(page)); > + clear_bit(file_page_offset(j), paddr); > else > - ext2_clear_bit(file_page_offset(j), page_address(page)); > + ext2_clear_bit(file_page_offset(j), paddr); > + kunmap_atomic(page, KM_USER0); Again: kunmap_atomic(paddr, KM_USER0); > } > } > spin_unlock_irqrestore(&bitmap->lock, flags); > @@ -1118,7 +1105,6 @@ int bitmap_daemon_work(struct bitmap *bi > > /* now sync the final page */ > if (lastpage != NULL) { > - kunmap(lastpage); > spin_lock_irqsave(&bitmap->lock, flags); > if (get_page_attr(bitmap, lastpage) &BITMAP_PAGE_NEEDWRITE) { > clear_page_attr(bitmap, lastpage, BITMAP_PAGE_NEEDWRITE); > > diff ./mm/page-writeback.c~current~ ./mm/page-writeback.c > --- ./mm/page-writeback.c~current~ 2005-11-22 14:06:53.000000000 +1100 > +++ ./mm/page-writeback.c 2005-11-22 14:07:05.000000000 +1100 > @@ -583,6 +583,10 @@ int write_one_page(struct page *page, in > }; > > BUG_ON(!PageLocked(page)); > + if (!mapping_cap_writeback_dirty(mapping)) { > + unlock_page(page); > + return ret; > + } > > if (wait) > wait_on_page_writeback(page); Hope this helps. Best regards, Anton -- Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Please help me understand ->writepage. Was Re: segfault mdadm --write-behind, 2.6.14-mm2 (was: Re: RAID1 ramdisk patch) 2005-11-22 12:00 ` Please help me understand ->writepage. " Anton Altaparmakov @ 2005-11-24 5:29 ` Neil Brown 0 siblings, 0 replies; 19+ messages in thread From: Neil Brown @ 2005-11-24 5:29 UTC (permalink / raw) To: Anton Altaparmakov; +Cc: Andrew Morton, sander, linux-kernel, reiserfs-dev On Tuesday November 22, aia21@cam.ac.uk wrote: > On Tue, 22 Nov 2005, Neil Brown wrote: > > I've made them all kmap_atomic. > > Except you did it wrong... See below... > > > - kunmap(bitmap->sb_page); > > + kunmap_atomic(bitmap->sb_page, KM_USER0); > > You need to pass in the address not the page, i.e.: > How.. umm... intuitive :-( Thanks, I'll fix that. > > Hope this helps. > It does. I really appreciate getting feedback on my code.... I've sometimes tempted to slip in a few bugs so that when people point them out to me I know they have read the rest of the code and that increases my confidence in it (I haven't actually done this... yet). :-) NeilBrown ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: segfault mdadm --write-behind, 2.6.14-mm2 2005-11-16 22:20 ` Andrew Morton 2005-11-16 23:08 ` Neil Brown @ 2005-11-18 14:18 ` Vladimir V. Saveliev 1 sibling, 0 replies; 19+ messages in thread From: Vladimir V. Saveliev @ 2005-11-18 14:18 UTC (permalink / raw) To: Andrew Morton; +Cc: sander, neilb, linux-kernel, reiserfs-dev Hello Andrew Morton wrote: > Sander <sander@humilis.net> wrote: >>Neil Brown wrote (ao): >>> If you use mdadm-2.0 and mark a device as --write-mostly, then all >>> read requests will go to the other device(s) if possible,. >>> e.g. >>> mdadm --create /dev/md0 --level=1 --raid-disks=2 /dev/ramdisk \ >>> --writemostly /dev/realdisk >>> >>> Does this suit your needs? >>> >>> You can also arrange for the write to the writemostly device to be >>> 'write-behind' so that the filesystem doesn't wait for the write to >>> complete. This can reduce write-latency (though not increase write >>> throughput) at a very small cost of reliability (if the RAM dies, the >>> disk may not be 100% up-to-date). >>With 2.6.14-mm2 (x86) and mdadm 2.1 I get a Segmentation fault when I >>try this: > > It oopsed in reiser4. reiserfs-dev added to Cc... > >>mdadm -C /dev/md1 -l1 -n2 --bitmap=/storage/md1.bitmap /dev/loop0 \ >>--write-behind /dev/loop1 >> >>loop0 is attached to a file on tmpfs, and loop1 is attached >>to a file on a lvm2 volume (reiser4, if that matters). >> I tried ext2 on lvm2 and that did not help. So, for now I would assume that the problem is not in reiser4 but somewhere else. ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2005-11-24 5:41 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-09-05 0:46 RAID1 ramdisk patch Wilco Baan Hofman 2005-09-05 1:27 ` Neil Brown 2005-09-05 7:40 ` Wilco Baan Hofman 2005-11-16 13:36 ` segfault mdadm --write-behind, 2.6.14-mm2 (was: Re: RAID1 ramdisk patch) Sander 2005-11-16 22:20 ` Andrew Morton 2005-11-16 23:08 ` Neil Brown 2005-11-17 7:50 ` Sander 2005-11-17 10:12 ` Sander 2005-11-17 10:15 ` Sander 2005-11-21 23:07 ` Please help me understand ->writepage. Was " Neil Brown 2005-11-21 23:30 ` Jeff Garzik 2005-11-21 23:51 ` Andrew Morton 2005-11-22 3:12 ` Neil Brown 2005-11-22 3:47 ` Andrew Morton 2005-11-22 10:34 ` Sander 2005-11-24 5:41 ` Please help me understand reiser4_writepage. " Neil Brown 2005-11-22 12:00 ` Please help me understand ->writepage. " Anton Altaparmakov 2005-11-24 5:29 ` Neil Brown 2005-11-18 14:18 ` segfault mdadm --write-behind, 2.6.14-mm2 Vladimir V. Saveliev
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox