* [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes
@ 2004-01-29 22:51 Paul Clements
2004-01-30 22:52 ` Paul Clements
2004-02-09 2:51 ` Neil Brown
0 siblings, 2 replies; 34+ messages in thread
From: Paul Clements @ 2004-01-29 22:51 UTC (permalink / raw)
To: linux-raid; +Cc: ptb, mingo, james.bottomley, Neil Brown
Description
===========
This patch provides the md driver with the ability to track
resync/rebuild progress with a bitmap. It also gives the raid1 driver
the ability to perform asynchronous writes (i.e., writes are
acknowledged before they actually reach the secondary disk). The bitmap
and asynchronous write capabilities are primarily useful when raid1 is
employed in data replication (e.g., with a remote disk served over nbd
as the secondary device). However, the bitmap is also useful for
reducing resync/rebuild times with ordinary (local) raid1, raid5, and
raid6 arrays.
Background
==========
This patch is an adjunct to Peter T. Breuer's raid1 bitmap code (fr1
v2.14, ftp://oboe.it.uc3m.es/pub/Programs/fr1-2.14.tgz). The code was
originally written for 2.4 (I have patches vs. 2.4.19/20 Red Hat and
SuSE kernels, if anyone is interested). The 2.4 version of this patch
has undergone extensive alpha, beta, and stress testing, including a WAN
setup where a 500MB partition was mirrored across the U.S. The 2.6
version of the patch remains as close to the 2.4 version as possible,
while still allowing it to function properly in the 2.6 kernel. The 2.6
code has also been tested quite a bit and is fairly stable.
Features
========
Persistent Bitmap
-----------------
The bitmap tracks which blocks are out of sync between the primary and
secondary disk in a raid1 array (in raid5, the bitmap would indicate
which stripes need to be rebuilt). The bitmap is stored in memory (for
speed) and on disk (for persistence, so that a full resync is never
needed, even after a failure or reboot).
There is a kernel daemon that periodically (lazily) clears bits in the
bitmap file (this reduces the number and frequency of disk writes to the
bitmap file).
The bitmap can also be rescaled -- i.e., change the amount of data that
each bit represents. This allows for increased efficiency at the cost of
reduced bitmap granularity.
Currently, the bitmap code has been implemented only for raid1, but it
could easily be leveraged by other raid drivers (namely raid5 and raid6)
by adding a few calls to the bitmap routines in the appropriate places.
Asynchronous Writes
-------------------
The asynchronous write capability allows the raid1 driver to function
more efficiently in data replication environments (i.e., where the
secondary disk is remote). Asynchronous writes allow us to overcome high
network latency by filling the network pipe.
Modifications to mdadm
----------------------
I have modified Neil's mdadm tool to allow it to configure the
additional bitmap and async parameters. The attached patch is against
the 1.2 mdadm release. Briefly, the new options are:
Creation:
mdadm -C /dev/md0 -l 1 -n 2 --persistent --async=512
--bitmap=/tmp/bitmap_md0_file,4096,5 /dev/xxx /dev/yyy
This creates a raid1 array with:
* 2 disks
* a persistent superblock
* asynchronous writes enabled (maximum of 512 outstanding writes)
* bitmap enabled (using the file /tmp/bitmap_md0_file)
* a bitmap chunksize of 4k (bitmap chunksize determines how much data
each bitmap bit represents)
* the bitmap daemon set to wake up every 5 seconds to clear bits in the
bitmap file (if needed)
* /dev/xxx as the primary disk
* /dev/yyy as the backup disk (when asynchronous writes are enabled, the
second disk in the array is labelled as a "backup", indicating that it
is remote, and thus no reads will be issued to the device)
Assembling:
mdadm -A /dev/md0 --bitmap=/tmp/bitmap_md0_file /dev/xxx /dev/yyy
This assembles an existing array and configures it to use a bitmap file.
The bitmap file pathname is not stored in the array superblock, and so
must be specified every time the array is assembled.
Details:
mdadm -D /dev/md0
This will display information about /dev/md0, including some additional
information about the bitmap and async parameters.
I've also added some information to the /proc/mdstat file:
# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 loop0[0] loop1[1](B)
39936 blocks [2/2] [UU]
async: 0/256 outstanding writes
bitmap: 1/1 pages (15 cached) [64KB], 64KB chunk, file:
/tmp/bitmap_md1
unused devices: <none>
More details on the design and implementation can be found in Section 3
of my 2003 OLS Paper:
http://archive.linuxsymposium.org/ols2003/Proceedings/All-Reprints/Reprint-Clements-OLS2003.pdf
Patch Location
==============
Finally, the patches are available here:
kernel patch vs. 2.6.2-rc2-bk3
------------------------------
http://parisc-linux.org/~jejb/md_bitmap/md_bitmap_2_30_2_6_2_RC2_BK3_RELEASE.diff
mdadm patch vs. 1.2.0
---------------------
http://parisc-linux.org/~jejb/md_bitmap/mdadm_1_2_0.diff
So if you're interested, please review, test, ask questions, etc. Any
feedback is welcome.
Thanks,
Paul
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-01-29 22:51 [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes Paul Clements @ 2004-01-30 22:52 ` Paul Clements 2004-02-09 2:51 ` Neil Brown 1 sibling, 0 replies; 34+ messages in thread From: Paul Clements @ 2004-01-30 22:52 UTC (permalink / raw) To: linux-raid, ptb, mingo, james.bottomley, Neil Brown I've uploaded a patch against the latest mdadm (1.5.0): http://parisc-linux.org/~jejb/md_bitmap/mdadm_1_5_0.diff Thanks, Paul Paul Clements wrote: > > Description > =========== > This patch provides the md driver with the ability to track > resync/rebuild progress with a bitmap. It also gives the raid1 driver > the ability to perform asynchronous writes (i.e., writes are > acknowledged before they actually reach the secondary disk). The bitmap > and asynchronous write capabilities are primarily useful when raid1 is > employed in data replication (e.g., with a remote disk served over nbd > as the secondary device). However, the bitmap is also useful for > reducing resync/rebuild times with ordinary (local) raid1, raid5, and > raid6 arrays. > > Background > ========== > This patch is an adjunct to Peter T. Breuer's raid1 bitmap code (fr1 > v2.14, ftp://oboe.it.uc3m.es/pub/Programs/fr1-2.14.tgz). The code was > originally written for 2.4 (I have patches vs. 2.4.19/20 Red Hat and > SuSE kernels, if anyone is interested). The 2.4 version of this patch > has undergone extensive alpha, beta, and stress testing, including a WAN > setup where a 500MB partition was mirrored across the U.S. The 2.6 > version of the patch remains as close to the 2.4 version as possible, > while still allowing it to function properly in the 2.6 kernel. The 2.6 > code has also been tested quite a bit and is fairly stable. > > Features > ======== > > Persistent Bitmap > ----------------- > The bitmap tracks which blocks are out of sync between the primary and > secondary disk in a raid1 array (in raid5, the bitmap would indicate > which stripes need to be rebuilt). The bitmap is stored in memory (for > speed) and on disk (for persistence, so that a full resync is never > needed, even after a failure or reboot). > > There is a kernel daemon that periodically (lazily) clears bits in the > bitmap file (this reduces the number and frequency of disk writes to the > bitmap file). > > The bitmap can also be rescaled -- i.e., change the amount of data that > each bit represents. This allows for increased efficiency at the cost of > reduced bitmap granularity. > > Currently, the bitmap code has been implemented only for raid1, but it > could easily be leveraged by other raid drivers (namely raid5 and raid6) > by adding a few calls to the bitmap routines in the appropriate places. > > Asynchronous Writes > ------------------- > The asynchronous write capability allows the raid1 driver to function > more efficiently in data replication environments (i.e., where the > secondary disk is remote). Asynchronous writes allow us to overcome high > network latency by filling the network pipe. > > Modifications to mdadm > ---------------------- > I have modified Neil's mdadm tool to allow it to configure the > additional bitmap and async parameters. The attached patch is against > the 1.2 mdadm release. Briefly, the new options are: > > Creation: > > mdadm -C /dev/md0 -l 1 -n 2 --persistent --async=512 > --bitmap=/tmp/bitmap_md0_file,4096,5 /dev/xxx /dev/yyy > > This creates a raid1 array with: > > * 2 disks > * a persistent superblock > * asynchronous writes enabled (maximum of 512 outstanding writes) > * bitmap enabled (using the file /tmp/bitmap_md0_file) > * a bitmap chunksize of 4k (bitmap chunksize determines how much data > each bitmap bit represents) > * the bitmap daemon set to wake up every 5 seconds to clear bits in the > bitmap file (if needed) > * /dev/xxx as the primary disk > * /dev/yyy as the backup disk (when asynchronous writes are enabled, the > second disk in the array is labelled as a "backup", indicating that it > is remote, and thus no reads will be issued to the device) > > Assembling: > > mdadm -A /dev/md0 --bitmap=/tmp/bitmap_md0_file /dev/xxx /dev/yyy > > This assembles an existing array and configures it to use a bitmap file. > The bitmap file pathname is not stored in the array superblock, and so > must be specified every time the array is assembled. > > Details: > > mdadm -D /dev/md0 > > This will display information about /dev/md0, including some additional > information about the bitmap and async parameters. > > I've also added some information to the /proc/mdstat file: > > # cat /proc/mdstat > Personalities : [raid1] > md1 : active raid1 loop0[0] loop1[1](B) > 39936 blocks [2/2] [UU] > async: 0/256 outstanding writes > bitmap: 1/1 pages (15 cached) [64KB], 64KB chunk, file: > /tmp/bitmap_md1 > > unused devices: <none> > > More details on the design and implementation can be found in Section 3 > of my 2003 OLS Paper: > http://archive.linuxsymposium.org/ols2003/Proceedings/All-Reprints/Reprint-Clements-OLS2003.pdf > > Patch Location > ============== > > Finally, the patches are available here: > > kernel patch vs. 2.6.2-rc2-bk3 > ------------------------------ > http://parisc-linux.org/~jejb/md_bitmap/md_bitmap_2_30_2_6_2_RC2_BK3_RELEASE.diff > > mdadm patch vs. 1.2.0 > --------------------- > http://parisc-linux.org/~jejb/md_bitmap/mdadm_1_2_0.diff > > So if you're interested, please review, test, ask questions, etc. Any > feedback is welcome. > > Thanks, > Paul ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-01-29 22:51 [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes Paul Clements 2004-01-30 22:52 ` Paul Clements @ 2004-02-09 2:51 ` Neil Brown 2004-02-09 19:45 ` Paul Clements 1 sibling, 1 reply; 34+ messages in thread From: Neil Brown @ 2004-02-09 2:51 UTC (permalink / raw) To: Paul Clements; +Cc: linux-raid, ptb, mingo, james.bottomley On Thursday January 29, Paul.Clements@SteelEye.com wrote: > Description > =========== > This patch provides the md driver with the ability to track > resync/rebuild progress with a bitmap. It also gives the raid1 driver > the ability to perform asynchronous writes (i.e., writes are > acknowledged before they actually reach the secondary disk). The bitmap > and asynchronous write capabilities are primarily useful when raid1 is > employed in data replication (e.g., with a remote disk served over nbd > as the secondary device). However, the bitmap is also useful for > reducing resync/rebuild times with ordinary (local) raid1, raid5, and > raid6 arrays. > Thanks for this. There is obviously some good functionality here, but I do have a few issues with the patch and the code that I would like to see addressed first. - The patch seems to add several separate, though related, pieces of functionality. It would be a lot easier to review if they were separate. There is (at least): - "intent-loging" in the bitmaps. - optimising resync using the intents. - async writes to non-primary devices - 'hotrepair' - bitmap storage. I gather these are stored on a separate device (in a separate filesystem). It would be nice to be able to store them on the raid array directly so the optimised-resync works on a root-filesytem device as well. There is nearly 64k after the superblock that can be used. I think there should be an option to store the bitmap on all the (primary) devices after the superblock. (This could be added later - a separate patch) - A question - can the bitmap be "rescaled" online, or only offline? (not that I particularly want on-line rescaling) How is it achieved? - ioctl - By changing the size of mdu_array_info_t you have changed the GET_ARRAY_INFO ioctl, so you lose backwards compatibility with user-space tools. If you want to return more info with GET_ARRAY_INFO (which is reasonable) we need to handle both the old and new mdu_array_info_t sizes. Ditto for SET_ARRAY_INFO - superblock. If you have an array with a bitmap attached, boot an old kernel that doesn't understand bitmaps, use the array for a while (after a full sync) have an unclean shutdown, and then boot back into the new kernel, the out-of-date bitmap will be used to optimise resync - incorrectly. Is this possibility handled at all? - I don't see the point of the new "array_size" personality method. Surely the size of the bitmap depends on the device size (mddev->size), rather than the array_size - after all, that is what md_do_sync iterates over. - I don't think it is right for "stop" to return EBUSY if there are outstanding async writes. It should either wait for the writes to complete, or abort them (or both, with a timeout). - Write requests tend to come in bursts. So when you get a burst of write requests, it would be most efficient to set all the on-disk bits for all requests in the burst, flush all blocks which have changed and then release the burst of write requests. If I read the code correctly, you do one bitmap-block-write for every bit that gets set. - returning -ENOMEM in the make_request function is a no-no. You need to either cope with a kmalloc failure, or use a mempool or similar to guarantee some memory either now or soon. - The bitmap_page structure is amusing. On memory failure, you highjack the "*map" pointer to form 2 16bit counters, and spend a whole 32bit "unsigned int" to flag if this is the case or not. Would it be better to use one bit of "counter" as the highjack flag, and thereby save a substantial fraction of the size of 'bitmap_page'? - What is the relationship between the 'bitmap_fd' field in mdu_array_info_s, and the SET_BITMAP_FILE ioctl? They both seem to set the bitmap file. Finally, do you have any measurements of the performance impact of the intent logging on various workloads? That'll do for now. NeilBrown ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-02-09 2:51 ` Neil Brown @ 2004-02-09 19:45 ` Paul Clements 2004-02-10 0:04 ` Neil Brown 0 siblings, 1 reply; 34+ messages in thread From: Paul Clements @ 2004-02-09 19:45 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid, ptb, mingo, james.bottomley Neil Brown wrote: > > On Thursday January 29, Paul.Clements@SteelEye.com wrote: > > Description > > =========== > > This patch provides the md driver with the ability to track > > resync/rebuild progress with a bitmap. It also gives the raid1 driver > > the ability to perform asynchronous writes (i.e., writes are > > acknowledged before they actually reach the secondary disk). The bitmap > > and asynchronous write capabilities are primarily useful when raid1 is > > employed in data replication (e.g., with a remote disk served over nbd > > as the secondary device). However, the bitmap is also useful for > > reducing resync/rebuild times with ordinary (local) raid1, raid5, and > > raid6 arrays. > > > > Thanks for this. ...and thanks for your feedback. There are some really good ideas/suggestions in here. A couple of them I had thought about, but wavered on which way to implement them. I'll respond to each of your points below... > - The patch seems to add several separate, though related, pieces of > functionality. It would be a lot easier to review if they were > separate. > There is (at least): > - "intent-loging" in the bitmaps. > - optimising resync using the intents. > - async writes to non-primary devices > - 'hotrepair' I can fairly easily split the bitmap code from the async write code. Splitting the smaller bitmap pieces from the rest of the bitmap code, however, is a little more difficult. Also, the "hotrepair" and "resync" parts would be so small, that I'm not sure it would be worth it. Would it be alright if I split it into two patches and see what that looks like? > - bitmap storage. I gather these are stored on a separate device > (in a separate filesystem). Yes. > It would be nice to be able to store > them on the raid array directly so the optimised-resync works on a > root-filesytem device as well. There is nearly 64k after the > superblock that can be used. I think there should be an option to > store the bitmap on all the (primary) devices after the superblock. > (This could be added later - a separate patch) Yes, that would be nice. I'll think about this some and maybe submit it as a follow-on patch. > - A question - can the bitmap be "rescaled" online, or only offline? > (not that I particularly want on-line rescaling) How is it achieved? The bitmap file can be rescaled offline by simply expanding or collapsing the bits as necessary, and then writing the enlarged or shrunken bitmap file. When the array is started (with a new bitmap chunk size), everything will work correctly. (The bitmap file can also be truncated, in which case the driver will assume all the bits to be dirty and perform a full resync). > - ioctl - By changing the size of mdu_array_info_t you have changed > the GET_ARRAY_INFO ioctl, so you lose backwards compatibility with > user-space tools. Yes, unfortunately... > If you want to return more info with > GET_ARRAY_INFO (which is reasonable) we need to handle both the old > and new mdu_array_info_t sizes. Ditto for SET_ARRAY_INFO OK, I'll code this up and include it in the next patch. Some type of versioning scheme (like what you've done with the superblock) would probably do the trick. > - superblock. If you have an array with a bitmap attached, boot an > old kernel that doesn't understand bitmaps, use the array for a > while (after a full sync) have an unclean shutdown, and then boot > back into the new kernel, the out-of-date bitmap will be used to > optimise resync - incorrectly. Is this possibility handled at all? No, I guess I was assuming that a persistent bitmap could always be trusted, but in this particular situation, that's not the case. I guess I'll have to add bitmap event counters to the superblock? > - I don't see the point of the new "array_size" personality method. > Surely the size of the bitmap depends on the device size > (mddev->size), rather than the array_size - after all, that is what > md_do_sync iterates over. Yikes...thanks for catching that. It got lost in the translation between 2.4 and 2.6 (size no longer comes directly from the superblock). I'll be happy to get rid of that bit of ugliness from the patch. > - I don't think it is right for "stop" to return EBUSY if there are > outstanding async writes. It should either wait for the writes to > complete, or abort them (or both, with a timeout). I'll see what I can do about this. Should be easy enough to wait for the writes either to complete or error out. > - Write requests tend to come in bursts. So when you get a burst of > write requests, it would be most efficient to set all the on-disk > bits for all requests in the burst, flush all blocks which have > changed and then release the burst of write requests. If I read the > code correctly, you do one bitmap-block-write for every bit that > gets set. That would be a very nice optimization. It would also be pretty involved. If you'd like I can look at doing this as a follow-on patch. > - returning -ENOMEM in the make_request function is a no-no. You need > to either cope with a kmalloc failure, or use a mempool or similar > to guarantee some memory either now or soon. Yes, I opted for the simple route here, but I think you're right; a mempool really is required. I'll code that up and include it in the next patch. > - The bitmap_page structure is amusing. I like to keep my readers entertained... :) > highjack the "*map" pointer to form 2 16bit counters, and spend a > whole 32bit "unsigned int" to flag if this is the case or not. > Would it be better to use one bit of "counter" as the highjack flag, > and thereby save a substantial fraction of the size of 'bitmap_page'? In all seriousness though, I opted for the extra flag because I didn't want to get into figuring out which bits of a pointer could be used as flags and which could not, across all the various 32 and 64 bit architectures...it was easier and less risky just to use a couple more bytes of memory. I suppose if pages are guaranteed to be aligned on some boundary, the lowest bit could be used as a flag without any risk of misinterpretation? At any rate, wasting 4 bytes per page didn't seem like a terrible loss. > - What is the relationship between the 'bitmap_fd' field in > mdu_array_info_s, and the SET_BITMAP_FILE ioctl? They both seem to > set the bitmap file. Yes, they are the same. mdadm opens the bitmap file and passes down a file descriptor (bitmap_fd). SET_BITMAP_FILE is used in the case of Assembling an array (where SET_ARRAY_INFO is called with a NULL array info pointer), while the bitmap_fd is simply passed down inside the array info struct in the Create case. > Finally, do you have any measurements of the performance impact of the > intent logging on various workloads? No, I don't have any exact numbers, but I haven't observed a slowdown in the tests I've performed. However, most of my tests were data replication setups using async writes, which tends to mitigate any slowdown caused by the bitmap. -- Paul ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-02-09 19:45 ` Paul Clements @ 2004-02-10 0:04 ` Neil Brown 2004-02-10 16:20 ` Paul Clements ` (2 more replies) 0 siblings, 3 replies; 34+ messages in thread From: Neil Brown @ 2004-02-10 0:04 UTC (permalink / raw) To: Paul Clements; +Cc: Neil Brown, linux-raid, ptb, mingo, james.bottomley On Monday February 9, Paul.Clements@SteelEye.com wrote: > Neil Brown wrote: > > - The patch seems to add several separate, though related, pieces of > > functionality. It would be a lot easier to review if they were > > separate. > > There is (at least): > > - "intent-loging" in the bitmaps. > > - optimising resync using the intents. > > - async writes to non-primary devices > > - 'hotrepair' > > I can fairly easily split the bitmap code from the async write code. > Splitting the smaller bitmap pieces from the rest of the bitmap code, > however, is a little more difficult. Also, the "hotrepair" and "resync" > parts would be so small, that I'm not sure it would be worth it. Would > it be alright if I split it into two patches and see what that looks > like? > Two would certainly be better than 1. Splitting off the "resync" stuff probably isn't necessary, but the 'hotrepair' seems conceptually quite separate and it would help to see the patch and the justification separately, but I won't push it. > > - A question - can the bitmap be "rescaled" online, or only offline? > > (not that I particularly want on-line rescaling) How is it achieved? > > The bitmap file can be rescaled offline by simply expanding or > collapsing the bits as necessary, and then writing the enlarged or > shrunken bitmap file. When the array is started (with a new bitmap chunk > size), everything will work correctly. (The bitmap file can also be > truncated, in which case the driver will assume all the bits to be dirty > and perform a full resync). > It would seem to make sense to store the "chunk size" in the bitmap file. In fact, if the bitmap file had a small header containing: magic number array uuid event count bitmap chunk size Then you could rescale the bitmap without touching the superblock and would at the same time solve the problem with possibly getting an out-of-date bitmap. > > > - ioctl - By changing the size of mdu_array_info_t you have changed > > the GET_ARRAY_INFO ioctl, so you lose backwards compatibility with > > user-space tools. > > Yes, unfortunately... > > > If you want to return more info with > > GET_ARRAY_INFO (which is reasonable) we need to handle both the old > > and new mdu_array_info_t sizes. Ditto for SET_ARRAY_INFO > > OK, I'll code this up and include it in the next patch. Some type of > versioning scheme (like what you've done with the superblock) would > probably do the trick. When you change the structure size, you change the ioctl number so you can differentiate requests based just on the ioctl. so you would need to define a "struct mda_array_info_s_old" and have #define GET_ARRAY_INFO _IOR (MD_MAJOR, 0x11, mdu_array_info_t) #define GET_ARRAY_INFO_OLD _IOR (MD_MAJOR, 0x11, struct mdu_array_info_s_old) and handle each ioctl separately (or translate one into the other early in md_ioctl). > > > - Write requests tend to come in bursts. So when you get a burst of > > write requests, it would be most efficient to set all the on-disk > > bits for all requests in the burst, flush all blocks which have > > changed and then release the burst of write requests. If I read the > > code correctly, you do one bitmap-block-write for every bit that > > gets set. > > That would be a very nice optimization. It would also be pretty > involved. If you'd like I can look at doing this as a follow-on patch. > You would need an "unplug" concept. When make_request is given a write request, it puts the request on a queue and schedules an bitmap update. You set mddev->queue->unplug_fn to a function which flushes the outstanding bitmap updates, and then moves the queue of outstanding write requests to somewhere that raid1d can pick them up and initiate them. > > > - The bitmap_page structure is amusing. > > I like to keep my readers entertained... :) > > > highjack the "*map" pointer to form 2 16bit counters, and spend a > > whole 32bit "unsigned int" to flag if this is the case or not. > > Would it be better to use one bit of "counter" as the highjack flag, > > and thereby save a substantial fraction of the size of 'bitmap_page'? > > In all seriousness though, I opted for the extra flag because I didn't > want to get into figuring out which bits of a pointer could be used as > flags and which could not, across all the various 32 and 64 bit > architectures...it was easier and less risky just to use a couple more > bytes of memory. I suppose if pages are guaranteed to be aligned on some > boundary, the lowest bit could be used as a flag without any risk of > misinterpretation? At any rate, wasting 4 bytes per page didn't seem > like a terrible loss. > True it isn't a great loss. It just seemed amusing that you squeezed two 16bit counters into the pointer (seeming to make best use of available memory) and then wasted 31 bits. If you put the "highjacked" flag in the counter (not in the pointer) as e.g. the lsb, doing all real counting by 2s, it should be straight forward. > > > - What is the relationship between the 'bitmap_fd' field in > > mdu_array_info_s, and the SET_BITMAP_FILE ioctl? They both seem to > > set the bitmap file. > > Yes, they are the same. mdadm opens the bitmap file and passes down a > file descriptor (bitmap_fd). SET_BITMAP_FILE is used in the case of > Assembling an array (where SET_ARRAY_INFO is called with a NULL array > info pointer), while the bitmap_fd is simply passed down inside the > array info struct in the Create case. > Ahh, I see. That's why you need both. I guess you could create the array "the old way", stop it, and then re-assemble it and pass down the fd then. If you kept the bitmap parameters in the head of the bitmap file rather than in the superblock, you could avoid having to change kernel-side array-creation altogether. I'm not actually a big fan of having the kernel create the array. You might note that for new style superblocks, the kernel will not create the array at all. You have to have user-space write out the initial superblocks, and then just have the kernel assemble the array. I would advocate doing the same thing for the bitmap superblock (if you choose to have one in the bitmap file) - i.e. userspace sets it up and passes it to the kernel, and the kernel extracts the information it needs. NeilBrown ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-02-10 0:04 ` Neil Brown @ 2004-02-10 16:20 ` Paul Clements 2004-02-10 16:57 ` Paul Clements 2004-02-13 20:58 ` Paul Clements 2 siblings, 0 replies; 34+ messages in thread From: Paul Clements @ 2004-02-10 16:20 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid, ptb, mingo, james.bottomley Neil Brown wrote: > > > > > - ioctl - By changing the size of mdu_array_info_t you have changed > > > the GET_ARRAY_INFO ioctl, so you lose backwards compatibility with > > > user-space tools. > > > > Yes, unfortunately... > > > > > If you want to return more info with > > > GET_ARRAY_INFO (which is reasonable) we need to handle both the old > > > and new mdu_array_info_t sizes. Ditto for SET_ARRAY_INFO > > > > OK, I'll code this up and include it in the next patch. Some type of > > versioning scheme (like what you've done with the superblock) would > > probably do the trick. > > When you change the structure size, you change the ioctl number so you > can differentiate requests based just on the ioctl. > so you would need to define a "struct mda_array_info_s_old" and have > #define GET_ARRAY_INFO _IOR (MD_MAJOR, 0x11, mdu_array_info_t) > #define GET_ARRAY_INFO_OLD _IOR (MD_MAJOR, 0x11, struct mdu_array_info_s_old) > > and handle each ioctl separately (or translate one into the other > early in md_ioctl). I was thinking of leaving the original ioctls alone and just adding two new ioctls that would handle only the new parameters, something like: #define GET_EXTRA_ARRAY_INFO _IOR (MD_MAJOR, 0x15, mdu_extra_array_info_t) #define SET_EXTRA_ARRAY_INFO _IOW (MD_MAJOR, 0x2b, mdu_extra_array_info_t) /* The extra array info structure is flexible, to allow for future changes */ #define EXTRA_ARRAY_INFO_VERSION 1 typedef struct mdu_extra_array_info_s { __u32 extra_info_version; /* EXTRA_ARRAY_INFO_VERSION */ __u32 async_max_writes; /* Max outstanding async writes (0 = sync) */ __u32 bitmap_fd; /* The bitmap file descriptor (in only) */ __u32 bitmap_chunksize; /* The bitmap chunksize */ __u32 bitmap_daemon_sleep; /* The bitmap daemon sleep period */ char bitmap_path[256]; /* The bitmap filename (out only) */ char pad[4096-256-20]; /* Allow for future additions, empty now */ } mdu_extra_array_info_t; So for older kernels, the ioctls would just error out and mdadm would ignore that, and the array would be run using only the old features. This will also allow me to get rid of the SET_BITMAP_FILE ioctl and just use the SET_EXTRA_ARRAY_INFO ioctl for Assembling. Sound OK? -- Paul ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-02-10 0:04 ` Neil Brown 2004-02-10 16:20 ` Paul Clements @ 2004-02-10 16:57 ` Paul Clements 2004-02-13 20:58 ` Paul Clements 2 siblings, 0 replies; 34+ messages in thread From: Paul Clements @ 2004-02-10 16:57 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid, ptb, mingo, james.bottomley Neil Brown wrote: > > > - A question - can the bitmap be "rescaled" online, or only offline? > > > (not that I particularly want on-line rescaling) How is it achieved? > > > > The bitmap file can be rescaled offline by simply expanding or > > collapsing the bits as necessary, and then writing the enlarged or > > shrunken bitmap file. When the array is started (with a new bitmap chunk > > size), everything will work correctly. (The bitmap file can also be > > truncated, in which case the driver will assume all the bits to be dirty > > and perform a full resync). > > > > It would seem to make sense to store the "chunk size" in the bitmap > file. > In fact, if the bitmap file had a small header containing: > magic number > array uuid > event count > bitmap chunk size > Then you could rescale the bitmap without touching the superblock and > would at the same time solve the problem with possibly getting an > out-of-date bitmap. That sounds good, as it would lend some more foolproofing to the bitmap file operations. The only problem being that we wouldn't be able to support bitmaps without backing files (in-memory only bitmaps). I guess if we don't think that's a useful feature we could do this. We could also add this bitmap file superblock as a follow-on patch. The code that reads in the bitmap file would simply look for the magic string in the superblock to determine if the bitmap file is a flat file (old style) or contains a superblock (new style). -- Paul ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-02-10 0:04 ` Neil Brown 2004-02-10 16:20 ` Paul Clements 2004-02-10 16:57 ` Paul Clements @ 2004-02-13 20:58 ` Paul Clements 2004-03-05 5:06 ` Neil Brown 2 siblings, 1 reply; 34+ messages in thread From: Paul Clements @ 2004-02-13 20:58 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid, ptb, mingo, james.bottomley And here is the next set of patches. Tested thoroughly against 2.6.2-rc2-bk3 and compile tested against 2.6.3-rc2. I've split the kernel code into two patches. The async write patch is dependent on the bitmap patch. The bitmap patch includes the new GET/SET_EXTRA_ARRAY_INFO ioctls and a mempool for the bitmap_update structures (no more -ENOMEM in the I/O request path). bitmap patch vs. 2.6.3-rc2 -------------------------- http://parisc-linux.org/~jejb/md_bitmap/md_bitmap_2_30_2_6_3_RC2.diff async write patch vs. 2.6.3-rc2 ------------------------------- http://parisc-linux.org/~jejb/md_bitmap/md_async_2_30_2_6_3_RC2.diff mdadm patch vs. 1.5.0 --------------------- http://parisc-linux.org/~jejb/md_bitmap/mdadm_1_5_0-2.diff For simplicity's sake, the mdadm patch includes both the bitmap and async code (it's a fairly small patch anyway, and probably wouldn't have benefited much from a split). -- Paul ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-02-13 20:58 ` Paul Clements @ 2004-03-05 5:06 ` Neil Brown 2004-03-05 22:05 ` Paul Clements 0 siblings, 1 reply; 34+ messages in thread From: Neil Brown @ 2004-03-05 5:06 UTC (permalink / raw) To: Paul Clements; +Cc: linux-raid, ptb, mingo, james.bottomley On Friday February 13, Paul.Clements@SteelEye.com wrote: > And here is the next set of patches. Tested thoroughly against > 2.6.2-rc2-bk3 and compile tested against 2.6.3-rc2. > > I've split the kernel code into two patches. The async write patch is > dependent on the bitmap patch. The bitmap patch includes the new > GET/SET_EXTRA_ARRAY_INFO ioctls and a mempool for the bitmap_update > structures (no more -ENOMEM in the I/O request path). > > bitmap patch vs. 2.6.3-rc2 > -------------------------- > http://parisc-linux.org/~jejb/md_bitmap/md_bitmap_2_30_2_6_3_RC2.diff > > async write patch vs. 2.6.3-rc2 > ------------------------------- > http://parisc-linux.org/~jejb/md_bitmap/md_async_2_30_2_6_3_RC2.diff > > mdadm patch vs. 1.5.0 > --------------------- > http://parisc-linux.org/~jejb/md_bitmap/mdadm_1_5_0-2.diff > > > For simplicity's sake, the mdadm patch includes both the bitmap and > async code (it's a fairly small patch anyway, and probably wouldn't have > benefited much from a split). > I finally had a chance to look at this properly -- or atleast at the "bitmap" patch. There is a lot in there that I don't like. I was just going to rip out the bits that I wasn't happy with and produce a minimal patch that I was happy with and which did something useful, but I ran into problems with the file io in bitmap.c. It is just simply wrong, and needs a substantial rewrite. Using "generic_file_write" is wrong. It is a service routine for filesystems to call. Not an interface for clients of a filesystem to call. You should look at loop.c. It seems to be closer to correct. You need to see the files as an "address_space" and use address_space operations to access it. ->readpage to read a page ->prepare_write / ->commit_write to write a page (or maybe write_page, I'm not sure) ->sync_page to sync a patch Apart from that.... The more I think about it, the less I like extra fields being added to the superblock. I think the bitmap file should have a header with magic uuid (which must match raid array) events counter (which must also match) chunk size other configurables. Then to assemble an array with a bitmap you create the bitmap file, and pass it down with a new ioctl. If the array has a persistent superblock, md check the header against the superblock, otherwise it just trusts you. You don't need any ioctls to get or set extra information about the array. Just read the header off the bitmap file. I'm not 100% sure what R1BIO_NotUptodate is supposed to do, but I'm fairly sure it does it wrongly, and I don't like the name (as it seems similar to "R1BIO_Uptodate", but is clearly a lot different. So, if you can produce a patch which *only* adds a persistent bitmap in a file, uses it to record which blocks are not in-sync, and optimises resync using the bitmap, and which uses address_space operations for fileio, then I will see if it is acceptable, and we can then start adding bits like hot-repair and async-write etc on top of that. NeilBrown ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-03-05 5:06 ` Neil Brown @ 2004-03-05 22:05 ` Paul Clements 2004-03-31 18:38 ` Paul Clements 0 siblings, 1 reply; 34+ messages in thread From: Paul Clements @ 2004-03-05 22:05 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid, ptb, mingo, james.bottomley Neil Brown wrote: > On Friday February 13, Paul.Clements@SteelEye.com wrote: > I finally had a chance to look at this properly -- or atleast at the > "bitmap" patch. Thanks for looking this over... > Using "generic_file_write" is wrong. It is a service routine for > filesystems to call. Not an interface for clients of a filesystem to > call. Agreed. I'm pretty unfamiliar with some of these interfaces, so I was not sure which ones were the correct ones to use. After looking over the interfaces a bit more and with some guidance from a couple of experts, I have a much better idea about what to do... > You should look at loop.c. It seems to be closer to correct. > You need to see the files as an "address_space" and use address_space > operations to access it. > ->readpage to read a page I'm already using read_cache_page(). Didn't realize that it would actually extend the file for me automatically. Given that it can do that, the generic read and write stuff can just be thrown out completely. > ->prepare_write / ->commit_write to write a page Yes, these combined with read_cache_page are all I need... > The more I think about it, the less I like extra fields being added > to the superblock. OK, fair enough. I'll look at adding a header to the bitmap file and I'll axe the superblock fields and the new ioctl I created. > I think the bitmap file should have a header with > magic > uuid (which must match raid array) > events counter (which must also match) > chunk size > other configurables. ...and maybe a version field and some padding to allow for future changes. This all means no more in-memory-only bitmaps, but I think that's not really a critical feature anyway. > Then to assemble an array with a bitmap you create the bitmap file, > and pass it down with a new ioctl. Right. Probably a lot simpler than the existing method. > If the array has a persistent superblock, md check the header against > the superblock, otherwise it just trusts you. Yep. > You don't need any ioctls to get or set extra information about the > array. Just read the header off the bitmap file. Yep. > I'm not 100% sure what R1BIO_NotUptodate is supposed to do, but I'm > fairly sure it does it wrongly, and I don't like the name (as it > seems similar to "R1BIO_Uptodate", but is clearly a lot different. Yes, looking back at this, there were a couple of errors in the way that bit was handled. I've corrected those and renamed the bit to R1BIO_Degraded. I think this is a better name as it's closer to what the bit really signifies. If the bit is set, it means: 1) either we've got a degraded array (only 1 disk), or 2) we've gotten a write failure (and we're about to degrade the array) In either case, we will not clear the bitmap. > So, if you can produce a patch which *only* adds a persistent bitmap > in a file, uses it to record which blocks are not in-sync, and > optimises resync using the bitmap, and which uses address_space > operations for fileio, then I will see if it is acceptable, and we can > then start adding bits like hot-repair and async-write etc on top of > that. I'll work on that and get it out as soon as I can... Thanks again, Paul ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-03-05 22:05 ` Paul Clements @ 2004-03-31 18:38 ` Paul Clements 2004-04-28 18:10 ` Paul Clements 2004-04-29 8:41 ` Neil Brown 0 siblings, 2 replies; 34+ messages in thread From: Paul Clements @ 2004-03-31 18:38 UTC (permalink / raw) To: Neil Brown, linux-raid, ptb, mingo, james.bottomley Paul Clements wrote: > Neil Brown wrote: > > So, if you can produce a patch which *only* adds a persistent bitmap > > in a file, uses it to record which blocks are not in-sync, and > > optimises resync using the bitmap, and which uses address_space > > operations for fileio, then I will see if it is acceptable, and we can > > then start adding bits like hot-repair and async-write etc on top of > > that. > > I'll work on that and get it out as soon as I can... OK, so here it is. The changes from the previous patch are, basically: 1) The patch is a little less intrusive now. The diffs against md/raid1 are smaller as more of the bitmap handling code is now isolated in the bitmap.c and bitmap.h files. Also, the diff against mdadm, while now a little larger, is less intrusive as I've created a bitmap.c file in the mdadm sources, as well. 2) The bitmap file I/O routines have been totally rewritten and are now much simpler. We now use read_cache_page, prepare_write/commit_write (to extend the file at init time, if necessary), and bmap/submit_bh (to "sync" the file pages). The use of bmap and submit_bh is due to a limitation in jbd that allows only one ongoing transaction per process (see comment in bitmap.c regarding current->journal_info). 3) The bitmap file now has a superblock containing various configuration information. The bitmap superblock is checked against the md superblock when the array is assembled. Options have been added to mdadm to create and examine bitmap files, and also to specify the bitmap file to be used with an array: Create a bitmap file: -------------------- # mdadm --create-bitmap=65536,3,580480 /tmp/bitmap --force (this creates a bitmap file with a chunksize of 64KB, a 3 second bitmap daemon sleep period, and a bitmap (initially dirtied) which is appropriate for use with an array of 580480 blocks) Examine a bitmap file: --------------------- # mdadm --examine-bitmap /tmp/bitmap (or just: mdadm -X /tmp/bitmap) Filename : /tmp/bitmap Magic : 6d746962 Version : 2 UUID : 997cb579.99d31d20.3014cae8.4bb4bf9d Events : 5 State : OK Chunksize : 4 KB Daemon : 5s flush period Array Size : 580480 (566.88 MiB 594.41 MB) Bitmap : 145120 bits (chunks), 145120 dirty (100.0%) (in addition, mdadm -D <array> gives the bitmap information if a bitmap is attached to the array) Create or Assemble array using a bitmap: --------------------------------------- # mdadm -C /dev/md2 -n 2 -l 1 --bitmap=/tmp/bitmap /dev/sda5 missing # mdadm -A /dev/md2 --bitmap=/tmp/bitmap /dev/sda5 /dev/sdb5 Patch Location: -------------- Patch vs. linux 2.6.5-rc2 ========================= http://parisc-linux.org/~jejb/md_bitmap/md_bitmap_2_31_2_6_5_RC2.diff Patch vs. mdadm 1.5.0 ===================== http://parisc-linux.org/~jejb/md_bitmap/mdadm_1_5_0-bitmap.diff (no async write patch has been generated, these patches contain only the bitmap code) Notes: ----- 1) an is_create flag was added to do_md_run to tell bitmap_create whether we are creating or just assembling the array -- this is necessary since 0.90 superblocks do not have a UUID until one is generated randomly at array creation time, therefore, we must set the bitmap UUID equal to this newly generated array UUID when the array is created 2) bitmap.h was not included in the mdadm patch, but a link (or copy) must be made to the kernel's include/linux/raid/bitmap.h file in order to build mdadm now 3) code was added to mdadm to allow creation of arrays with non-persistent superblocks (also, device size calculation with non-persistent superblocks was fixed) 4) a fix was made to the hot_remove code to allow a faulty device to be removed 5) various typo and minor bug fixes were also included in the patches ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-03-31 18:38 ` Paul Clements @ 2004-04-28 18:10 ` Paul Clements 2004-04-28 18:53 ` Peter T. Breuer 2004-04-29 8:41 ` Neil Brown 1 sibling, 1 reply; 34+ messages in thread From: Paul Clements @ 2004-04-28 18:10 UTC (permalink / raw) To: linux-raid; +Cc: Neil Brown, ptb, mingo, james.bottomley Since there hasn't been any negative feedback, I assume that this version is acceptable for inclusion in mainline/-mm? Thanks, Paul Paul Clements wrote: > OK, so here it is. The changes from the previous patch are, basically: > > 1) The patch is a little less intrusive now. The diffs against md/raid1 > are smaller as more of the bitmap handling code is now isolated in the > bitmap.c and bitmap.h files. Also, the diff against mdadm, while now a > little larger, is less intrusive as I've created a bitmap.c file in the > mdadm sources, as well. > > 2) The bitmap file I/O routines have been totally rewritten and are now > much simpler. We now use read_cache_page, prepare_write/commit_write (to > extend the file at init time, if necessary), and bmap/submit_bh (to > "sync" the file pages). The use of bmap and submit_bh is due to a > limitation in jbd that allows only one ongoing transaction per process > (see comment in bitmap.c regarding current->journal_info). > > 3) The bitmap file now has a superblock containing various configuration > information. The bitmap superblock is checked against the md superblock > when the array is assembled. > > Options have been added to mdadm to create and examine bitmap files, and > also to specify the bitmap file to be used with an array: > > Create a bitmap file: > -------------------- > > # mdadm --create-bitmap=65536,3,580480 /tmp/bitmap --force > > (this creates a bitmap file with a chunksize of 64KB, a 3 second bitmap > daemon sleep period, and a bitmap (initially dirtied) which is > appropriate for use with an array of 580480 blocks) > > Examine a bitmap file: > --------------------- > > # mdadm --examine-bitmap /tmp/bitmap (or just: mdadm -X /tmp/bitmap) > Filename : /tmp/bitmap > Magic : 6d746962 > Version : 2 > UUID : 997cb579.99d31d20.3014cae8.4bb4bf9d > Events : 5 > State : OK > Chunksize : 4 KB > Daemon : 5s flush period > Array Size : 580480 (566.88 MiB 594.41 MB) > Bitmap : 145120 bits (chunks), 145120 dirty (100.0%) > > (in addition, mdadm -D <array> gives the bitmap information if a bitmap > is attached to the array) > > > Create or Assemble array using a bitmap: > --------------------------------------- > > # mdadm -C /dev/md2 -n 2 -l 1 --bitmap=/tmp/bitmap /dev/sda5 missing > > # mdadm -A /dev/md2 --bitmap=/tmp/bitmap /dev/sda5 /dev/sdb5 > > > > Patch Location: > -------------- > > Patch vs. linux 2.6.5-rc2 > ========================= > http://parisc-linux.org/~jejb/md_bitmap/md_bitmap_2_31_2_6_5_RC2.diff > > Patch vs. mdadm 1.5.0 > ===================== > http://parisc-linux.org/~jejb/md_bitmap/mdadm_1_5_0-bitmap.diff > > (no async write patch has been generated, these patches contain only the > bitmap code) > > > Notes: > ----- > > 1) an is_create flag was added to do_md_run to tell bitmap_create > whether we are creating or just assembling the array -- this is > necessary since 0.90 superblocks do not have a UUID until one is > generated randomly at array creation time, therefore, we must set the > bitmap UUID equal to this newly generated array UUID when the array is > created > > 2) bitmap.h was not included in the mdadm patch, but a link (or copy) > must be made to the kernel's include/linux/raid/bitmap.h file in order > to build mdadm now > > 3) code was added to mdadm to allow creation of arrays with > non-persistent superblocks (also, device size calculation with > non-persistent superblocks was fixed) > > 4) a fix was made to the hot_remove code to allow a faulty device to be > removed > > 5) various typo and minor bug fixes were also included in the patches > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-04-28 18:10 ` Paul Clements @ 2004-04-28 18:53 ` Peter T. Breuer 0 siblings, 0 replies; 34+ messages in thread From: Peter T. Breuer @ 2004-04-28 18:53 UTC (permalink / raw) To: Paul Clements; +Cc: linux-raid, Neil Brown, ptb, mingo, james.bottomley "Also sprach Paul Clements:" > Since there hasn't been any negative feedback, I assume that this > version is acceptable for inclusion in mainline/-mm? whew ... since I just got out of a %^&% grant appication spin, and can breathe now, you prompted me to take a look. First thing I notice is md.o is replaced with md-mod.o. Why? That's gratuitous breakage of every poor admin's scripts that up till now has been doing an insmod md, or has had a "md" in /etc/modules. Owwww .... Is the reason perhaps that md.c is no longer the whole story? So that there is extra linking? I guess so. Why not put the extra stuff in a separate module and leave md.o be? I'll not comment on the bitmap.c file (I still need to figure out ow it can possibly work :-). But I like the comments in the file! Looks comprehensible and very neat. Perhaps a summary of what's in it could be put up top ... or is that likely to get out of step? Perhaps. The changes to md.c ... look very benign. I see you add a function, export a function previously static ... add bitmap sb update to the sb update fn and export it. I can't be sure from the diff alone, but it looks as though a bitmap is always created with bitmap_create, and there is no way to stop it? Is that right? I'd need to see more context in the diff to figure it out. Perhaps the calls to bintmmap_create and bitmap_destroy don't do anything unless the sb is marked as "wanting" get_bitmap_file seems to be intended to talk to a user, but it's static! Are you going to use it in an ioctl? If so, surely the ioctl part should do the copy_to_user, and this function should not? It's also go a kmalloc of a buf inside, and I would have thought you had loadsa potential bufferspace lying around in the superblocks you have in memory ... oh well, it's independent this way. There's a tiny change - if (rdev->raid_disk >= 0) + if (rdev->raid_disk >= 0 && !rdev->faulty) goto busy; which needs more context for me to judge! So we don't pretend we're busy if what we're busy with is faulty? Where are we? It must be a diskop of some kind. The code following ... + + /* check the sb on the new disk to see whether the bitmap can be used + * to resync it (if not, we dirty the bitmap to cause a full resync) */ + if (mddev->bitmap) { + int bitmap_invalidate = 1; + mdk_rdev_t *rdev0 = list_entry(mddev->disks.next, + mdk_rdev_t, same_set); looks entirely opaque to me! Loads of if's. There's an addition of set_bitmap_file as an allowed diskop when we aren't initialised yet - entirely harmless and correct. Ah .. here I think we have the ioctl for get_bitmap_file. Yes, but I would have prefered the copy_to_user here, not in the cllaed routine. Maybe set_bitmap_file does the same "trick"? There's some status report output reformatting which is hard to figure. Maybe only a \n added? Probably because you follow with a bitmap status report when there is one. Shouldn't this bitmap report be a function in the bitmap.c file? Or would that be all cyclic in terms of dependencies? Here there is something I don't get: - if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) + /* we don't use the checkpoint if there's a bitmap */ + if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) && !mddev->bitmap) Are you trying to avoid the speed throttling problem here? I guess the checkpoints are where the speed is recalculated too. When I patched md.c here to get around the throttle (it doesn't know that some of the synced blocks are "virtual", thanks to the bitmap code), I did it by adding an extra array of "discounted blocks" that the sync code could increment every time it skipped something. Then I changed the throttle speed calculation in md.c to discount the discounted blocks each time, and only consider the remainder, which had been "really" written. So the throttle acted on the real i/o speed, not the virtual speed. But for printout, I used the virtual speed. Here's another bit that needs explanation: - blk_run_queues(); + if (need_sync) + blk_run_queues(); And here ... + /* don't worry about speed limits unless we do I/O */ + if (!need_sync) + continue; Are you saying that in the corner case when the sync at since the previous checkpoint has been entirely "virtual", we just dive right back round again? What's this "need_sync" *exactly*? When do you think we "need" a sync? Hurrr .. I see changes to raid1.c next. I'll consider them after a stiff drink. In summary, the md.c changes look almost certainly benign to me, with just a couple of one-liners that I would need tolook at harder. I would be grateful if you maybe comment on them here (or even in the code). Perhaps commenting the patch would be appropriate? Can one add text between hunks? One can do it before the diff starts. Peter ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-03-31 18:38 ` Paul Clements 2004-04-28 18:10 ` Paul Clements @ 2004-04-29 8:41 ` Neil Brown 2004-05-04 20:08 ` Paul Clements 2004-06-08 20:53 ` Paul Clements 1 sibling, 2 replies; 34+ messages in thread From: Neil Brown @ 2004-04-29 8:41 UTC (permalink / raw) To: Paul Clements; +Cc: linux-raid, ptb, mingo, james.bottomley On Wednesday March 31, Paul.Clements@SteelEye.com wrote: and less that a month later I replied .... I've been busy and had two weeks leave in there. Sorry. > > Create a bitmap file: > -------------------- > > # mdadm --create-bitmap=65536,3,580480 /tmp/bitmap --force > Maybe: mdadm --create-bitmap --chunksize=64 --delay=3 --size==580480 ??? while more verbose, it is also easier to remember and more in-keeping with the rest of mdadm's syntax. > 1) an is_create flag was added to do_md_run to tell bitmap_create > whether we are creating or just assembling the array -- this is > necessary since 0.90 superblocks do not have a UUID until one is > generated randomly at array creation time, therefore, we must set the > bitmap UUID equal to this newly generated array UUID when the array is > created I think this is the wrong approach. You are justifying a design error by reference to a previous design error. I think user-space should be completely responsible for creating the bitmap file including setting the UUID. Either 1/ add the bitmap after the array has been created. or 2/ Create the array in user-space and just get the kernel to assemble it (this is what I will almost certainly do in mdadm once I get around to supporting version 1 superblocks). > > 3) code was added to mdadm to allow creation of arrays with > non-persistent superblocks (also, device size calculation with > non-persistent superblocks was fixed) > > 4) a fix was made to the hot_remove code to allow a faulty device to be > removed > > 5) various typo and minor bug fixes were also included in the patches please, Please, PLEASE, keep separate patches separate. It makes them much easier to review, and hence makes acceptance much more likely. In particular, your fix in 4 is, I think, wrong. I agree that there is a problem here. I don't think your fix is correct. But, onto the patch. 1/ Why bitmap_alloc_page instead of just kmalloc? If every kernel subsystem kept it's own private cache of memory in case of desperate need, then there would be a lot of memory wastage. Unless you have evidence that times when you need to allocate a bitmap are frequently times when there is excessive memory pressure, I think you should just trust kmalloc. On the other hand, if you have reason to believe that the services of kmalloc are substantially suboptimal for your needs, you should explain why in a comment. 2/There is a race in bitmap_checkpage. I assume that the required postcondition is that (if the arguments are valid) either bitmap->bp[page].hijacked or bitmap->bp[page].map but not both. If this is the case, then + if ((mappage = bitmap_alloc_page(bitmap)) == NULL) { + PRINTK("%s: bitmap map page allocation failed, hijacking\n", + bmname(bitmap)); + /* failed - set the hijacked flag so that we can use the + * pointer as a counter */ + spin_lock_irqsave(&bitmap->lock, flags); + bitmap->bp[page].hijacked = 1; + goto out_unlock; + } should become: + if ((mappage = bitmap_alloc_page(bitmap)) == NULL) { + PRINTK("%s: bitmap map page allocation failed, hijacking\n", + bmname(bitmap)); + /* failed - set the hijacked flag so that we can use the + * pointer as a counter */ + spin_lock_irqsave(&bitmap->lock, flags); + if (!bitmap->bp[page].map) bitmap->bp[page].hijacked = 1; + goto out_unlock; + } as someone else could have allocated a bitmap while we were trying and failing. 3/ Your bmap_page / sync_page is very filesystem-specific. bmap_page sets page->private to a linked-list of buffers. However page->private "belongs" to the address-space that the page is in, which means the filesystem. I don't know if any filesystems use page->private for anything other than a list of buffers, but they certainly could if they wanted to, and if they did, this code would suddenly break. I think that if you have your heart set on being able to store the bitmap in a file, that using a loop-back mount would be easiest. But if you don't want to do that, at least use the correct interface. Referring to the code in loop.c would help. Another alternative would be to use the approach that swapfile uses. i.e. create a list of extents using bmap information, and then do direct IO to the device using this extent information. swapfile chooses to ignore any extent that is less that a page. You might not want to do that, but you wouldn't have to. 4/ I would avoid extending the file in the kernel. It is too easy to do that in user space. Just require that the file is the correct size. 5/ I find it very confusing that you kmap a page, and then leave it to some function that you call to kunmap the page (unmap_put_page or sync_put_page). It looks very unbalanced. I would much rather see the kunmap in the same function as the kmap. 6/ It seems odd that bitmap_set_state calls unmap_put_page instead of sync_put_page. Surely you want to sync the superblock at this point. 7/ The existence of bitmap_get_state confuses me. Why not store the state in the bitmap structure in bitmap_read_sb?? 8/ calling md_force_spare(.., 1) to force a sync is definitely wrong. You appear to be assuming a raid1 array with exactly two devices. Can't you just set recovery_cp to 0, or maybe just set a flag somewhere and test it in md_check_recovery?? 9/ why don't you just pass "%s_bitmap" as the thread name to md_register_thread ? As far as I can tell, it would have exactly the same effect as the current code without requiring a kmalloc. 10/ +static void bitmap_stop_daemon(struct bitmap *bitmap) +{ + mdk_thread_t *daemon; + unsigned long flags; + + spin_lock_irqsave(&bitmap->lock, flags); + if (!bitmap->daemon) { + spin_unlock_irqrestore(&bitmap->lock, flags); + return; + } + daemon = bitmap->daemon; + bitmap->daemon = NULL; + spin_unlock_irqrestore(&bitmap->lock, flags); + md_unregister_thread(daemon); /* destroy the thread */ +} would look better as: +static void bitmap_stop_daemon(struct bitmap *bitmap) +{ + mdk_thread_t *daemon; + unsigned long flags; + + spin_lock_irqsave(&bitmap->lock, flags); + daemon = bitmap->daemon; + bitmap->daemon = NULL; + spin_unlock_irqrestore(&bitmap->lock, flags); + if (bitmap->daemon) + md_unregister_thread(daemon); /* destroy the thread */ +} 11/ md_update_sb needs to be called with the mddev locked, and I think there are times when you call it without the lock. I would prefer it if you left it 'static' and just set the sb_dirty flag. raid[156]d will the update date it soon enough. 12/ The tests in hot_add_disk to see if the newly added device is sufficiently up-to-date that the bitmap can be used to get it the rest of the way up-to-date must be wrong as they don't contain any reference to 'events'. You presumably want to be able to fail/remove a drive and then re-add it and not require a full resync. For this to work, you need to record an event number when the bitmap switches from "which blocks on active drives are not in-sync" to "which blocks active drives have changed since a drive failed", and when you incorporate a new device, only allow it to be synced based on the bitmap if the event counter is at least as new as the saved one (plus the checks you currently have). 13/ The test + } else if (atomic_read(&mddev->bitmap_file->f_count) > 2) { in set_bitmap_file is half-hearted at best. What you probably want is "deny_write_access". Or just check that the file is owned by root and isn't world writable. The check against the uuid in the header should be enough to ensure that operator error doesn't result in the one file being used for two arrays. 14/ In md_do_sync, I think you should move "cond_reshed()" above @@ -3312,6 +3506,10 @@ static void md_do_sync(mddev_t *mddev) goto out; } + /* don't worry about speed limits unless we do I/O */ + if (!need_sync) + continue; + /* * this loop exits only if either when we are slower than * the 'hard' speed limit, or the system was IO-idle for to make sure that mdX_sync doesn't steal too much time before allowing a reschedule. and the worst part about it is that the code doesn't support what I would think would be the most widely used and hence most useful case, and that is to store the bitmap in the 60K of space after the superblock. If you had implemented that first, then you could have avoided all the mucking about with files, which seems to be a problematic area, and probably had working and accepted code by now. Then adding support for separate files could have been layered on top. That would have meant that each piece of submitted code was substantially smaller and hence easier to review. Anyway, enough for now. I'll try to review your next patch with a little less delay, but it is a substantial effort, and that makes it easy to put off. NeilBrown ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-04-29 8:41 ` Neil Brown @ 2004-05-04 20:08 ` Paul Clements 2004-06-08 20:53 ` Paul Clements 1 sibling, 0 replies; 34+ messages in thread From: Paul Clements @ 2004-05-04 20:08 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid, ptb, mingo, james.bottomley Neil Brown wrote: > On Wednesday March 31, Paul.Clements@SteelEye.com wrote: > and less that a month later I replied .... I've been busy and had > two weeks leave in there. Sorry. Ahh, I noticed you'd been absent from the mailing lists for a bit... Anyway, thanks for taking the time to review this... >>Create a bitmap file: >>-------------------- >> >># mdadm --create-bitmap=65536,3,580480 /tmp/bitmap --force >> > > > Maybe: > mdadm --create-bitmap --chunksize=64 --delay=3 --size==580480 > ??? > while more verbose, it is also easier to remember and more in-keeping > with the rest of mdadm's syntax. OK...and it will probably change slightly if we're not doing bitmaps in files...more on that below... >>1) an is_create flag was added to do_md_run to tell bitmap_create >>whether we are creating or just assembling the array -- this is >>necessary since 0.90 superblocks do not have a UUID until one is >>generated randomly at array creation time, therefore, we must set the >>bitmap UUID equal to this newly generated array UUID when the array is >>created > > > I think this is the wrong approach. You are justifying a design error > by reference to a previous design error. I agree, I don't like it either, but there is no clean solution that I can think of... > I think user-space should be completely responsible for creating the > bitmap file including setting the UUID. > Either > 1/ add the bitmap after the array has been created. We can't do this because the initial resync would have started. > or > 2/ Create the array in user-space and just get the kernel to > assemble it (this is what I will almost certainly do in mdadm > once I get around to supporting version 1 superblocks). I'd gladly support version 1 superblocks, but currently the tools and kernel support for them is not complete. So in order to have working code, right now, unfortunately, my hack is a necessary evil. If there's a cleaner way to make this work, I'm certainly open to suggestions. >>3) code was added to mdadm to allow creation of arrays with >>non-persistent superblocks (also, device size calculation with >>non-persistent superblocks was fixed) >> >>4) a fix was made to the hot_remove code to allow a faulty device to be >>removed >> >>5) various typo and minor bug fixes were also included in the patches > > > > please, Please, PLEASE, keep separate patches separate. It makes them > much easier to review, and hence makes acceptance much more likely. Yes, I should have cleaned the patch up a bit...sorry about that... > In particular, your fix in 4 is, I think, wrong. I agree that there > is a problem here. I don't think your fix is correct. Could you explain what's wrong with it? I'll be glad to alter the patch if there's a better way to fix this problem. > > But, onto the patch. > > > 1/ Why bitmap_alloc_page instead of just kmalloc? That was part of Peter's original patch and I never bothered to change it. Admittedly, there probably is not a valid reason for having the cache. I'll remove it and just use kmalloc. If we find we need it later, it can be added back... > If every kernel subsystem kept it's own private cache of memory > in case of desperate need, then there would be a lot of memory > wastage. Unless you have evidence that times when you need to > allocate a bitmap are frequently times when there is excessive > memory pressure, I think you should just trust kmalloc. > On the other hand, if you have reason to believe that the services > of kmalloc are substantially suboptimal for your needs, you should > explain why in a comment. > > 2/There is a race in bitmap_checkpage. > I assume that the required postcondition is that (if the arguments > are valid) either bitmap->bp[page].hijacked or bitmap->bp[page].map > but not both. If this is the case, then > > + if ((mappage = bitmap_alloc_page(bitmap)) == NULL) { > + PRINTK("%s: bitmap map page allocation failed, hijacking\n", > + bmname(bitmap)); > + /* failed - set the hijacked flag so that we can use the > + * pointer as a counter */ > + spin_lock_irqsave(&bitmap->lock, flags); > + bitmap->bp[page].hijacked = 1; > + goto out_unlock; > + } > > should become: > > + if ((mappage = bitmap_alloc_page(bitmap)) == NULL) { > + PRINTK("%s: bitmap map page allocation failed, hijacking\n", > + bmname(bitmap)); > + /* failed - set the hijacked flag so that we can use the > + * pointer as a counter */ > + spin_lock_irqsave(&bitmap->lock, flags); > + if (!bitmap->bp[page].map) bitmap->bp[page].hijacked = 1; > + goto out_unlock; > + } > > as someone else could have allocated a bitmap while we were trying > and failing. Yes, I'll fix that. > > > 3/ Your bmap_page / sync_page is very filesystem-specific. > > bmap_page sets page->private to a linked-list of buffers. > However page->private "belongs" to the address-space that the page > is in, which means the filesystem. Yes, you're right. > I don't know if any filesystems use page->private for anything > other than a list of buffers, but they certainly could if they > wanted to, and if they did, this code would suddenly break. XFS uses page->private differently... > > I think that if you have your heart set on being able to store the > bitmap in a file, that using a loop-back mount would be easiest. > But if you don't want to do that, at least use the correct > interface. Referring to the code in loop.c would help. We could not do the prepare_write/commit_write (as loop does) because of the current->journal_info limitation in jbd/ext3 (i.e., a single process cannot have two jbd transactions ongoing at a time, even though the two transactions are for different filesystems). In order to work around that limitation, we would have had to create a separate thread to do the bitmap writes, which is complex and probably too slow to be an acceptable solution. I now agree with you that (due to the aforementioned limitations) bitmap files are not going to work. I think, at least for now, we'll go with a bitmap located on a device at a certain offset. So, for a bitmap located on the md partition itself, we specify an offset of sb_offset+4096. For a standalone bitmap device/partition (or loopback mount of a file, as you suggested) we give a 0 offset. > Another alternative would be to use the approach that swapfile uses. > i.e. create a list of extents using bmap information, and then do > direct IO to the device using this extent information. > swapfile chooses to ignore any extent that is less that a page. > You might not want to do that, but you wouldn't have to. > > > 4/ I would avoid extending the file in the kernel. It is too easy > to do that in user space. Just require that the file is the > correct size. OK. That's easy enough to change. > 5/ I find it very confusing that you kmap a page, and then leave it > to some function that you call to kunmap the page (unmap_put_page > or sync_put_page). It looks very unbalanced. I would much > rather see the kunmap in the same function as the kmap. > > 6/ It seems odd that bitmap_set_state calls unmap_put_page instead > of sync_put_page. Surely you want to sync the superblock at this > point. > 7/ The existence of bitmap_get_state confuses me. Why not store the > state in the bitmap structure in bitmap_read_sb?? > > 8/ calling md_force_spare(.., 1) to force a sync is definitely > wrong. You appear to be assuming a raid1 array with exactly two > devices. Can't you just set recovery_cp to 0, or maybe just set > a flag somewhere and test it in md_check_recovery?? This is code that was necessary in 2.4, where it was harder to trigger a resync. I think this can be cleaned up for 2.6. > 9/ why don't you just pass "%s_bitmap" as the thread name to > md_register_thread ? As far as I can tell, it would have exactly > the same effect as the current code without requiring a kmalloc. OK > 10/ > +static void bitmap_stop_daemon(struct bitmap *bitmap) > +{ > + mdk_thread_t *daemon; > + unsigned long flags; > + > + spin_lock_irqsave(&bitmap->lock, flags); > + if (!bitmap->daemon) { > + spin_unlock_irqrestore(&bitmap->lock, flags); > + return; > + } > + daemon = bitmap->daemon; > + bitmap->daemon = NULL; > + spin_unlock_irqrestore(&bitmap->lock, flags); > + md_unregister_thread(daemon); /* destroy the thread */ > +} > > would look better as: > > +static void bitmap_stop_daemon(struct bitmap *bitmap) > +{ > + mdk_thread_t *daemon; > + unsigned long flags; > + > + spin_lock_irqsave(&bitmap->lock, flags); > + daemon = bitmap->daemon; > + bitmap->daemon = NULL; > + spin_unlock_irqrestore(&bitmap->lock, flags); > + if (bitmap->daemon) > + md_unregister_thread(daemon); /* destroy the thread */ > +} OK > > 11/ md_update_sb needs to be called with the mddev locked, and I > think there are times when you call it without the lock. > I would prefer it if you left it 'static' and just set the > sb_dirty flag. raid[156]d will the update date it soon > enough. I think that will be OK, as long as it doesn't open up a window for things to get into an inconsistent state if there's a crash. > 12/ The tests in hot_add_disk to see if the newly added device is > sufficiently up-to-date that the bitmap can be used to get it > the rest of the way up-to-date must be wrong as they don't > contain any reference to 'events'. > You presumably want to be able to fail/remove a drive and then > re-add it and not require a full resync. > For this to work, you need to record an event number when the > bitmap switches from "which blocks on active drives are not > in-sync" to "which blocks active drives have changed since a > drive failed", and when you incorporate a new device, only > allow it to be synced based on the bitmap if the event counter > is at least as new as the saved one (plus the checks you > currently have). Yes, the current code assumes only two partitions, and thus does not do this extra checking. I'll look at adding that. > 13/ The test > + } else if (atomic_read(&mddev->bitmap_file->f_count) > 2) { > in set_bitmap_file is half-hearted at best. > What you probably want is "deny_write_access". I'll check that out. > Or just check that the file is owned by root and isn't world > writable. > > The check against the uuid in the header should be enough to > ensure that operator error doesn't result in the one file > being used for two arrays. > > 14/ In md_do_sync, I think you should move "cond_reshed()" above > > @@ -3312,6 +3506,10 @@ static void md_do_sync(mddev_t *mddev) > goto out; > } > > + /* don't worry about speed limits unless we do I/O */ > + if (!need_sync) > + continue; > + > /* > * this loop exits only if either when we are slower than > * the 'hard' speed limit, or the system was IO-idle for > > to make sure that mdX_sync doesn't steal too much time before > allowing a reschedule. I'll look at doing this. > > and the worst part about it is that the code doesn't support what I > would think would be the most widely used and hence most useful case, > and that is to store the bitmap in the 60K of space after the > superblock. As mentioned above, the next patch will support this configuration (as well as standalone bitmap devices/partitions). Hopefully, the next patch will be more to your liking (and much smaller, too...) -- Paul ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-04-29 8:41 ` Neil Brown 2004-05-04 20:08 ` Paul Clements @ 2004-06-08 20:53 ` Paul Clements 2004-06-08 22:47 ` Neil Brown ` (2 more replies) 1 sibling, 3 replies; 34+ messages in thread From: Paul Clements @ 2004-06-08 20:53 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid, ptb, mingo, james.bottomley Neil, Here's the latest patch...it supports bitmaps in files as well as block devices (disks or partitions), contrary to what I had stated in my previous e-mail. I've tried to address all the issues you've pointed out, and generally cleaned up and fixed the patch some more...details below... Patches available at: bitmap patch for 2.6.6: http://dsl2.external.hp.com/~jejb/md_bitmap/md_bitmap_2_32_2_6_6.diff mdadm patch against v1.6.0: http://dsl2.external.hp.com/~jejb/md_bitmap/mdadm_1_6_0-bitmap.diff (the normal parisc-linux.org URLs are not working right now, for some reason...) Thanks again, Paul Neil Brown wrote: > On Wednesday March 31, Paul.Clements@SteelEye.com wrote: > and less that a month later I replied .... I've been busy and had > two weeks leave in there. Sorry. > > >>Create a bitmap file: >>-------------------- >> >># mdadm --create-bitmap=65536,3,580480 /tmp/bitmap --force >> > > > Maybe: > mdadm --create-bitmap --chunksize=64 --delay=3 --size==580480 > ??? > while more verbose, it is also easier to remember and more in-keeping > with the rest of mdadm's syntax. OK, this is done. I reused the existing --chunk, --delay, and --size options. The bitmap "file" can be either a file (e.g., /tmp/my_bitmap) or a block device (e.g., /dev/sdd10). >>1) an is_create flag was added to do_md_run to tell bitmap_create >>whether we are creating or just assembling the array -- this is >>necessary since 0.90 superblocks do not have a UUID until one is >>generated randomly at array creation time, therefore, we must set the >>bitmap UUID equal to this newly generated array UUID when the array is >>created > > > I think this is the wrong approach. You are justifying a design error > by reference to a previous design error. > I think user-space should be completely responsible for creating the > bitmap file including setting the UUID. > Either > 1/ add the bitmap after the array has been created. > or > 2/ Create the array in user-space and just get the kernel to > assemble it (this is what I will almost certainly do in mdadm > once I get around to supporting version 1 superblocks). I could not find another way to make this work with the existing code, so this remains as is. >>3) code was added to mdadm to allow creation of arrays with >>non-persistent superblocks (also, device size calculation with >>non-persistent superblocks was fixed) >> >>4) a fix was made to the hot_remove code to allow a faulty device to be >>removed >> >>5) various typo and minor bug fixes were also included in the patches > > > > please, Please, PLEASE, keep separate patches separate. It makes them > much easier to review, and hence makes acceptance much more likely. I've removed all the other bug fixes, spelling corrections, etc. > In particular, your fix in 4 is, I think, wrong. I agree that there > is a problem here. I don't think your fix is correct. > > But, onto the patch. > > > 1/ Why bitmap_alloc_page instead of just kmalloc? > If every kernel subsystem kept it's own private cache of memory > in case of desperate need, then there would be a lot of memory > wastage. Unless you have evidence that times when you need to > allocate a bitmap are frequently times when there is excessive > memory pressure, I think you should just trust kmalloc. > On the other hand, if you have reason to believe that the services > of kmalloc are substantially suboptimal for your needs, you should > explain why in a comment. This has been changed to simply kmalloc/kfree. > 2/There is a race in bitmap_checkpage. > I assume that the required postcondition is that (if the arguments > are valid) either bitmap->bp[page].hijacked or bitmap->bp[page].map > but not both. If this is the case, then > > + if ((mappage = bitmap_alloc_page(bitmap)) == NULL) { > + PRINTK("%s: bitmap map page allocation failed, hijacking\n", > + bmname(bitmap)); > + /* failed - set the hijacked flag so that we can use the > + * pointer as a counter */ > + spin_lock_irqsave(&bitmap->lock, flags); > + bitmap->bp[page].hijacked = 1; > + goto out_unlock; > + } > > should become: > > + if ((mappage = bitmap_alloc_page(bitmap)) == NULL) { > + PRINTK("%s: bitmap map page allocation failed, hijacking\n", > + bmname(bitmap)); > + /* failed - set the hijacked flag so that we can use the > + * pointer as a counter */ > + spin_lock_irqsave(&bitmap->lock, flags); > + if (!bitmap->bp[page].map) bitmap->bp[page].hijacked = 1; > + goto out_unlock; > + } > > as someone else could have allocated a bitmap while we were trying > and failing. This has been fixed. > 3/ Your bmap_page / sync_page is very filesystem-specific. > > bmap_page sets page->private to a linked-list of buffers. > However page->private "belongs" to the address-space that the page > is in, which means the filesystem. > I don't know if any filesystems use page->private for anything > other than a list of buffers, but they certainly could if they > wanted to, and if they did, this code would suddenly break. > > I think that if you have your heart set on being able to store the > bitmap in a file, that using a loop-back mount would be easiest. > But if you don't want to do that, at least use the correct > interface. Referring to the code in loop.c would help. I've basically rewritten the bitmap file read/write code to be exactly like loop.c. This includes the current limitation of needing an extra write thread in which to perform the bitmap file writes (again, due to the jbd current->journal_info problem). > Another alternative would be to use the approach that swapfile uses. > i.e. create a list of extents using bmap information, and then do > direct IO to the device using this extent information. > swapfile chooses to ignore any extent that is less that a page. > You might not want to do that, but you wouldn't have to. > > > 4/ I would avoid extending the file in the kernel. It is too easy > to do that in user space. Just require that the file is the > correct size. The file will no longer be extended in the kernel. > 5/ I find it very confusing that you kmap a page, and then leave it > to some function that you call to kunmap the page (unmap_put_page > or sync_put_page). It looks very unbalanced. I would much > rather see the kunmap in the same function as the kmap. This has been cleaned up and simplified. > 6/ It seems odd that bitmap_set_state calls unmap_put_page instead > of sync_put_page. Surely you want to sync the superblock at this > point. There's now an explicit bitmap_update_sb where needed. > 7/ The existence of bitmap_get_state confuses me. Why not store the > state in the bitmap structure in bitmap_read_sb?? bitmap_get_state is gone...the state now gets recorded in bitmap->flags and read from there > 8/ calling md_force_spare(.., 1) to force a sync is definitely > wrong. You appear to be assuming a raid1 array with exactly two > devices. Can't you just set recovery_cp to 0, or maybe just set > a flag somewhere and test it in md_check_recovery?? This has been updated to the 2.6 style of kicking off recovery (setting MD_RECOVERY_NEEDED and waking up the recovery thread). > 9/ why don't you just pass "%s_bitmap" as the thread name to > md_register_thread ? As far as I can tell, it would have exactly > the same effect as the current code without requiring a kmalloc. Right, fixed. > 10/ > +static void bitmap_stop_daemon(struct bitmap *bitmap) > +{ > + mdk_thread_t *daemon; > + unsigned long flags; > + > + spin_lock_irqsave(&bitmap->lock, flags); > + if (!bitmap->daemon) { > + spin_unlock_irqrestore(&bitmap->lock, flags); > + return; > + } > + daemon = bitmap->daemon; > + bitmap->daemon = NULL; > + spin_unlock_irqrestore(&bitmap->lock, flags); > + md_unregister_thread(daemon); /* destroy the thread */ > +} > > would look better as: > > +static void bitmap_stop_daemon(struct bitmap *bitmap) > +{ > + mdk_thread_t *daemon; > + unsigned long flags; > + > + spin_lock_irqsave(&bitmap->lock, flags); > + daemon = bitmap->daemon; > + bitmap->daemon = NULL; > + spin_unlock_irqrestore(&bitmap->lock, flags); > + if (bitmap->daemon) > + md_unregister_thread(daemon); /* destroy the thread */ > +} OK, changed. > 11/ md_update_sb needs to be called with the mddev locked, and I > think there are times when you call it without the lock. > I would prefer it if you left it 'static' and just set the > sb_dirty flag. raid[156]d will the update date it soon > enough. Agreed. Actually, since the bitmap state is no longer in the md superblock, this is completely unneeded and has been removed. > 12/ The tests in hot_add_disk to see if the newly added device is > sufficiently up-to-date that the bitmap can be used to get it > the rest of the way up-to-date must be wrong as they don't > contain any reference to 'events'. > You presumably want to be able to fail/remove a drive and then > re-add it and not require a full resync. > For this to work, you need to record an event number when the > bitmap switches from "which blocks on active drives are not > in-sync" to "which blocks active drives have changed since a > drive failed", and when you incorporate a new device, only > allow it to be synced based on the bitmap if the event counter > is at least as new as the saved one (plus the checks you > currently have). Right. The extra check and event fields in the superblock have been added. > 13/ The test > + } else if (atomic_read(&mddev->bitmap_file->f_count) > 2) { > in set_bitmap_file is half-hearted at best. > What you probably want is "deny_write_access". I now use a slightly modified version of deny_write_access... > Or just check that the file is owned by root and isn't world > writable. > > The check against the uuid in the header should be enough to > ensure that operator error doesn't result in the one file > being used for two arrays. > > 14/ In md_do_sync, I think you should move "cond_reshed()" above > > @@ -3312,6 +3506,10 @@ static void md_do_sync(mddev_t *mddev) > goto out; > } > > + /* don't worry about speed limits unless we do I/O */ > + if (!need_sync) > + continue; > + > /* > * this loop exits only if either when we are slower than > * the 'hard' speed limit, or the system was IO-idle for > > to make sure that mdX_sync doesn't steal too much time before > allowing a reschedule. Done. > and the worst part about it is that the code doesn't support what I > would think would be the most widely used and hence most useful case, > and that is to store the bitmap in the 60K of space after the > superblock. Unfortunately, this type of setup performs rather abysmally (generally, about a 5-10x slowdown in write performance). If you think about what is happening to the disk head, it becomes clear why. In fact, having the intent log anywhere on the same physical media as the array components gives very bad performance. For this reason, I have not taken extra steps to support this configuration. If anyone is curious, this type of setup can be tested using device mapper (but not loop, because loop does not have the correct sync semantics) to map that area of the disk as a separate device and use it as a bitmap. > If you had implemented that first, then you could have avoided all the > mucking about with files, which seems to be a problematic area, and > probably had working and accepted code by now. Then adding support > for separate files could have been layered on top. > That would have meant that each piece of submitted code was > substantially smaller and hence easier to review. > > > Anyway, enough for now. I'll try to review your next patch with a > little less delay, but it is a substantial effort, and that makes it > easy to put off. > > > > NeilBrown > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-06-08 20:53 ` Paul Clements @ 2004-06-08 22:47 ` Neil Brown 2004-06-14 23:39 ` Neil Brown 2004-06-15 6:27 ` Neil Brown 2 siblings, 0 replies; 34+ messages in thread From: Neil Brown @ 2004-06-08 22:47 UTC (permalink / raw) To: Paul Clements; +Cc: linux-raid, ptb, mingo, james.bottomley On Tuesday June 8, paul.clements@steeleye.com wrote: > Neil, > > Here's the latest patch...it supports bitmaps in files as well as block > devices (disks or partitions), contrary to what I had stated in my > previous e-mail. I've tried to address all the issues you've pointed > out, and generally cleaned up and fixed the patch some more...details > below... Thanks. I've penciled it into my diary to look through this on Tuesday 15th NeilBrown ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-06-08 20:53 ` Paul Clements 2004-06-08 22:47 ` Neil Brown @ 2004-06-14 23:39 ` Neil Brown 2004-06-14 23:59 ` James Bottomley 2004-06-15 6:27 ` Neil Brown 2 siblings, 1 reply; 34+ messages in thread From: Neil Brown @ 2004-06-14 23:39 UTC (permalink / raw) To: Paul Clements; +Cc: linux-raid, ptb, mingo, james.bottomley On Tuesday June 8, paul.clements@steeleye.com wrote: > Neil, > > Here's the latest patch...it supports bitmaps in files as well as block > devices (disks or partitions), contrary to what I had stated in my > previous e-mail. I've tried to address all the issues you've pointed > out, and generally cleaned up and fixed the patch some more...details > below... > > Patches available at: > > bitmap patch for 2.6.6: > http://dsl2.external.hp.com/~jejb/md_bitmap/md_bitmap_2_32_2_6_6.diff > > mdadm patch against v1.6.0: > http://dsl2.external.hp.com/~jejb/md_bitmap/mdadm_1_6_0-bitmap.diff > > (the normal parisc-linux.org URLs are not working right now, for some > reason...) Unfortunately, these dsl2 URLs aren't working today - I should have grabbed the patches earlier. I'll check again later in the day. > > >>1) an is_create flag was added to do_md_run to tell bitmap_create > >>whether we are creating or just assembling the array -- this is > >>necessary since 0.90 superblocks do not have a UUID until one is > >>generated randomly at array creation time, therefore, we must set the > >>bitmap UUID equal to this newly generated array UUID when the array is > >>created > > > > > > I think this is the wrong approach. You are justifying a design error > > by reference to a previous design error. > > I think user-space should be completely responsible for creating the > > bitmap file including setting the UUID. > > Either > > 1/ add the bitmap after the array has been created. > > or > > 2/ Create the array in user-space and just get the kernel to > > assemble it (this is what I will almost certainly do in mdadm > > once I get around to supporting version 1 superblocks). > > I could not find another way to make this work with the existing code, > so this remains as is. I guess that means you (or someone) needs to write some code. Creating the array, including allocating the UUID, in userspace is trivial. Just make some superblocks and write them into the right location in each device. Then assemble the array as you would any pre-existing array. > > > and the worst part about it is that the code doesn't support what I > > would think would be the most widely used and hence most useful case, > > and that is to store the bitmap in the 60K of space after the > > superblock. > > Unfortunately, this type of setup performs rather abysmally (generally, > about a 5-10x slowdown in write performance). If you think about what is > happening to the disk head, it becomes clear why. In fact, having the > intent log anywhere on the same physical media as the array components > gives very bad performance. For this reason, I have not taken extra > steps to support this configuration. If anyone is curious, this type of > setup can be tested using device mapper (but not loop, because loop does > not have the correct sync semantics) to map that area of the disk as a > separate device and use it as a bitmap. Ahhh... this explains a comment you made some time ago where I thought performance would be pretty poor and you said it wasn't. I was imagining the bitmap in the same device as the array and you weren't. I think I mentioned at the time that this could be addressed by using "plugging". When a write request arrives for a block that isn't flagged as "dirty", we flag it 'dirty' and put the request on a queue. When an "unplug" request is made, we flush all updates to the "dirty" bitmap, and then release the queued write requests. There could still be a noticeable performance hit, but it should be substantially less than with the current code. Everything else you mentioned sounds good. When I manage to get a copy of the patch I'll comment further. NeilBrown ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-06-14 23:39 ` Neil Brown @ 2004-06-14 23:59 ` James Bottomley 0 siblings, 0 replies; 34+ messages in thread From: James Bottomley @ 2004-06-14 23:59 UTC (permalink / raw) To: Neil Brown; +Cc: Paul Clements, linux-raid, ptb, mingo On Mon, 2004-06-14 at 19:39, Neil Brown wrote: > Unfortunately, these dsl2 URLs aren't working today - I should have > grabbed the patches earlier. I'll check again later in the day. I'm sorry, that's my fault. dsl2 is being decomissioned ... it was a crappy x86 box, we're replacing it with a parisc one. The correct URL should have www.parisc-linux.org in place of dsl2.external.hp.com James ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-06-08 20:53 ` Paul Clements 2004-06-08 22:47 ` Neil Brown 2004-06-14 23:39 ` Neil Brown @ 2004-06-15 6:27 ` Neil Brown 2004-06-17 17:57 ` Paul Clements ` (2 more replies) 2 siblings, 3 replies; 34+ messages in thread From: Neil Brown @ 2004-06-15 6:27 UTC (permalink / raw) To: Paul Clements; +Cc: linux-raid, ptb, mingo, james.bottomley On Tuesday June 8, paul.clements@steeleye.com wrote: > Neil, > > Here's the latest patch...it supports bitmaps in files as well as block > devices (disks or partitions), contrary to what I had stated in my > previous e-mail. I've tried to address all the issues you've pointed > out, and generally cleaned up and fixed the patch some more...details > below... This looks a lot better. There are a few little issues that I noticed on a quick read through: - last_block_device should return "sector_t", not "unsigned long". It would be worth checking for other placed that sector_t might be needed. - bitmap_checkpage still isn't quite safe. If "hijacked" gets set while it is allocating a page (unlikely, but possible), it will exit with both highjacked set and an allocated page, which isn't right. - The event comparison + sb->events_hi >= refsb->bitmap_events_hi && + sb->events_lo >= refsb->bitmap_events_lo) { in hot_add_disk is wrong. I'll try to make time to try the patch out in a week or so to get a better feel for it. The changes to mdadm need a bit of work. You have added "--persistent" and "--non-persistent" flags for --create. This is wrong. --create always uses persistent superblocks. --build makes arrays without persistent superblocks. I don't think I like --create-bitmap. A bitmap file should always be created in the context of a particular array (partly so that the size and uuid can be set correctly). I think I would like to bitmap to be specified as a "--bitmap=filename" option to --create, --build, or --grow. I haven't thought this through in great detail yet, but that is my leaning. --examine-bitmap is a bit of a pain too. I think I would like --examine to figure out what it has been given to look at, and report on whatever it finds. NeilBrown ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-06-15 6:27 ` Neil Brown @ 2004-06-17 17:57 ` Paul Clements 2004-06-18 20:48 ` Paul Clements 2004-06-23 21:48 ` Paul Clements 2 siblings, 0 replies; 34+ messages in thread From: Paul Clements @ 2004-06-17 17:57 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid, ptb, mingo, james.bottomley Neil Brown wrote: > The changes to mdadm need a bit of work. > > You have added "--persistent" and "--non-persistent" flags for > --create. This is wrong. > --create always uses persistent superblocks. > --build makes arrays without persistent superblocks. Yes, I think --build didn't use to work for raid1, though, which is why I went the create route. I'll fix this. > I don't think I like --create-bitmap. A bitmap file should always be > created in the context of a particular array (partly so that the size > and uuid can be set correctly). I think I would like to bitmap to be > specified as a "--bitmap=filename" option to --create, --build, or > --grow. I haven't thought this through in great detail yet, but that > is my leaning. That all sounds fine except for --build. How do we know if we're doing the initial build or just a rebuild. We need to know this so we know whether to initialize the bitmap or simply use it. So I think we still need --create-bitmap for the build command (and then build will just use the bitmap, not initialize it). For --create I can make it so that the bitmap is always initialized using the array parameters, so no separate --create-bitmap will be needed. I'll need to work out the conflict that this will create with the --chunk option, though, since --chunk is the array chunk size for --create and --chunk is the bitmap chunk size for --bitmap-create. Any suggestions? > --examine-bitmap is a bit of a pain too. -X is usually what I use...that's a lot easier to remember and type... > I think I would like > --examine to figure out what it has been given to look at, and report > on whatever it finds. Well, that gets into a little bit of black magic when you're talking about examining a disk or partition, which could conceivably contain both an md superblock at the end and a bitmap at the front. How do we know which one it is "supposed" to be? Or do you just want examine to print information for both? That would seem a little confusing to me (granted this would only happen when a disk changed roles between bitmap and array component). -- Paul ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-06-15 6:27 ` Neil Brown 2004-06-17 17:57 ` Paul Clements @ 2004-06-18 20:48 ` Paul Clements 2004-06-23 21:48 ` Paul Clements 2 siblings, 0 replies; 34+ messages in thread From: Paul Clements @ 2004-06-18 20:48 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid, ptb, mingo, james.bottomley Neil, responses to your feedback on the kernel code: Neil Brown wrote: > On Tuesday June 8, paul.clements@steeleye.com wrote: > This looks a lot better. There are a few little issues that I noticed > on a quick read through: > - last_block_device should return "sector_t", not "unsigned > long". It would be worth checking for other placed that > sector_t might be needed. Yes, there were a couple. I've fixed them now. > - bitmap_checkpage still isn't quite safe. If "hijacked" gets > set while it is allocating a page (unlikely, but possible), it > will exit with both highjacked set and an allocated page, which > isn't right. Yes, that's right. Fixed. > - The event comparison > + sb->events_hi >= refsb->bitmap_events_hi && > + sb->events_lo >= refsb->bitmap_events_lo) { > in hot_add_disk is wrong. I think it's OK. Basically, I only record bitmap events while the array is in sync. As soon as the array goes out of sync, we continue to set bits in the bitmap, but never clear them. This means that the bitmap is valid for resyncing any disk that was part of the array at the time it was last in sync. Does that make sense, or am I missing a corner case? > I'll try to make time to try the patch out in a week or so to get a > better feel for it. OK, that sounds great. Thanks, Paul ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-06-15 6:27 ` Neil Brown 2004-06-17 17:57 ` Paul Clements 2004-06-18 20:48 ` Paul Clements @ 2004-06-23 21:48 ` Paul Clements 2004-06-23 21:50 ` Paul Clements ` (2 more replies) 2 siblings, 3 replies; 34+ messages in thread From: Paul Clements @ 2004-06-23 21:48 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid, ptb, mingo, james.bottomley Hi Neil, Here's the next round of patches: kernel patch against 2.6.7: -------------------------- http://www.parisc-linux.org/~jejb/md_bitmap/md_bitmap_2_32_2_6_7.diff mdadm patch against 1.6.0: ------------------------- http://www.parisc-linux.org/~jejb/md_bitmap/md_bitmap_2_32_2_6_7.diff Details below... Neil Brown wrote: > On Tuesday June 8, paul.clements@steeleye.com wrote: > This looks a lot better. There are a few little issues that I noticed > on a quick read through: > - last_block_device should return "sector_t", not "unsigned > long". It would be worth checking for other placed that > sector_t might be needed. Fixed. > - bitmap_checkpage still isn't quite safe. If "hijacked" gets > set while it is allocating a page (unlikely, but possible), it > will exit with both highjacked set and an allocated page, which > isn't right. Fixed. > - The event comparison > + sb->events_hi >= refsb->bitmap_events_hi && > + sb->events_lo >= refsb->bitmap_events_lo) { > in hot_add_disk is wrong. This is still there, see my explanation in the previous email... > The changes to mdadm need a bit of work. > > You have added "--persistent" and "--non-persistent" flags for > --create. This is wrong. > --create always uses persistent superblocks. > --build makes arrays without persistent superblocks. OK, changed. The --build command now also accepts the --bitmap option. > I don't think I like --create-bitmap. A bitmap file should always be > created in the context of a particular array (partly so that the size > and uuid can be set correctly). I think I would like to bitmap to be > specified as a "--bitmap=filename" option to --create, --build, or > --grow. I haven't thought this through in great detail yet, but that > is my leaning. OK, --create-bitmap remains solely for the --build case, but is no longer needed for --create. The --create command now does all the array creation in userspace and simply Assembles the resulting collection of disks. It also initialises the bitmap correctly for use with the array (i.e., uuid and size match). > --examine-bitmap is a bit of a pain too. I think I would like > --examine to figure out what it has been given to look at, and report > on whatever it finds. This also remains as-is for now, pending your reply to my previous emails. Thanks, Paul ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-06-23 21:48 ` Paul Clements @ 2004-06-23 21:50 ` Paul Clements 2004-07-06 14:52 ` Paul Clements [not found] ` <40F7E50F.2040308@steeleye.com> 2 siblings, 0 replies; 34+ messages in thread From: Paul Clements @ 2004-06-23 21:50 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid, ptb, mingo, james.bottomley Paul Clements wrote: > Hi Neil, > > Here's the next round of patches: [snip] > mdadm patch against 1.6.0: > ------------------------- > http://www.parisc-linux.org/~jejb/md_bitmap/md_bitmap_2_32_2_6_7.diff err, this should be: http://www.parisc-linux.org/~jejb/md_bitmap/mdadm_1_6_0-bitmap-2.diff ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-06-23 21:48 ` Paul Clements 2004-06-23 21:50 ` Paul Clements @ 2004-07-06 14:52 ` Paul Clements [not found] ` <40F7E50F.2040308@steeleye.com> 2 siblings, 0 replies; 34+ messages in thread From: Paul Clements @ 2004-07-06 14:52 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid, ptb, mingo, james.bottomley Paul Clements wrote: > Hi Neil, > > Here's the next round of patches: > > kernel patch against 2.6.7: > -------------------------- > http://www.parisc-linux.org/~jejb/md_bitmap/md_bitmap_2_32_2_6_7.diff A quick note for anyone who is interested: this patch misses the poolinfo structure changes that were made to raid1 between 2.6.6 and 2.6.7. Unfortunately, the mempool interfaces use casts from void *, so this wasn't caught by the compiler. Trivial changes need to be made to r1bio_pool_free for the code to work with 2.6.7. Thanks, Paul ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <40F7E50F.2040308@steeleye.com>]
[parent not found: <16649.61212.310271.36561@cse.unsw.edu.au>]
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes [not found] ` <16649.61212.310271.36561@cse.unsw.edu.au> @ 2004-08-10 21:37 ` Paul Clements 2004-08-13 3:04 ` Neil Brown 0 siblings, 1 reply; 34+ messages in thread From: Paul Clements @ 2004-08-10 21:37 UTC (permalink / raw) To: Neil Brown; +Cc: jejb, linux-raid Neil, I've implemented the improvements that you've suggested below, and preliminary tests are showing some very good results! The latest patches are available at: http://www.parisc-linux.org/~jejb/md_bitmap/ Further details below... Thanks, Paul Neil Brown wrote: > It's looking a lot better. I can start focussing on deeper issues > now. > > It really bothers me that all the updates are synchronous - that isn't > a good approach for getting good performance. Yes, that's quite true. In fact, in my simple write performance tests I used to see a slowdown of around 30% with the bitmap file...now, the difference is not even measurable! (This is with the bitmap file located on a dedicated disk, in order to reduce disk head movement, which tends to degrade performance). > This is how I think it should happen: > > When a write request arrives for a block where the corresponding bit > is already set on disk, continue with the request (as currently > happens). > > When a write request arrives for a block where the corresponding bit > is *not* set on disk, set the bit in ram (if not already set), queue > the write request for later handling, and queue the bitmap block to > be written. Also mark the queue as "plugged". > > When an unplug request comes, Send all queued bitmap blocks to disk, > then wait for them all to complete, then send all queue raid write > requests to disk. > > When a write request completes, decrement the corresponding > counter but don't clear the "shadow" bit if the count hits zero. > Instead flag the block as "might have unsynced-zeros". > > The write-back thread slowly walks around the bitmap looking for > blocks which might have an unsynced zero. They are checked to see > if they still do. If they do, the disk-bit is cleared and the > disk-block is queued for writing. > > > There might need to be a couple of refinements to this, but I think it > is the right starting point. I've implemented more or less what you've described above in this latest patch. > With this approach you wouldn't need the "bitmap_update" in r1bio_s as > the write-back daemon doesn't have a list of things to do, but instead > it periodically scans to see what needs to be done. Yes, getting rid of the bitmap_update structure was a very good idea. The code is much cleaner without that... ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-08-10 21:37 ` Paul Clements @ 2004-08-13 3:04 ` Neil Brown 2004-09-21 3:28 ` Paul Clements 0 siblings, 1 reply; 34+ messages in thread From: Neil Brown @ 2004-08-13 3:04 UTC (permalink / raw) To: Paul Clements; +Cc: jejb, linux-raid On Tuesday August 10, paul.clements@steeleye.com wrote: > Neil, > > I've implemented the improvements that you've suggested below, and > preliminary tests are showing some very good results! > > The latest patches are available at: > > http://www.parisc-linux.org/~jejb/md_bitmap/ > This is definitely heading in the right direction. However I think you have missed some of the subtleties of plugging. Plugging should be seen as entirely the responsibility of each device. Each device will plug it's own queue if or when it thinks that is a good idea, and will possibly unplug it spontaneously. Thus if you ever call generic_make_request, you have to assume that the request could get services at any moment. The only interest that clients of a device have in plugging is that they may request the device to unplug, in case it still has some old requests in a plugged queue. When a raid1 write request comes, your code passes that request to generic_make_request, and schedules a bitmap update for later. Presumably you assume that the underlying devices are or will-be plugged and only get unplugged when ->unplug_fn is called. However this isn't true. What you need to do is: In the WRITE branch of make_request, schedule the bitmap update as you currently do, but *don't* call generic_make_request on the bio. Instead, you should add the bio to a queue using ->bi_next for linkage. Then in raid1_unplug, you should: Call bitmap_unplug (this waits for the bitmaps to be safe) Walk through the queue of bio and submit them with make_request. As the bit(s) will already be marked dirty, the request will go straight through. Finally call unplug_slaves. That should then have the bitmap updates happening before the data updates, and should cluster bitmap updates better. The next step would be to stop write_page from being synchronous - I'm not comfortable about ->unplug-fn having to wait for disk io. This looks to be quite awkward as there is not call-back that happens when writeback on a page is completed. You need to have a kernel thread to wait for the pages to complete. I would get the bitmap_daemon to do this. write_page puts the page on a queue and wakes up bitmap_write_daemon bitmap_write_daemon calls prepare_write/commit_write and puts the page on another queue and wakes up bitmap_daemon. bitmap_daemon whenever it is woken up: - checks if any clean bitmaps needs to be written and puts them on the queue for bitmap_write_daemon - checks if there is a page in the queue from bitmap_write_daemon and, if there is, waits for it (wait_on_page_writeback). When that completes, put the page and decrease a count, and go round the loop. - if the count of pending bitmap updates hits zero, bitmap_daemon walks the queue of pending write requests of the array (as I suggested above could be done in raid1_unplug) and submits them. Does that make sense? I would suggest doing the first version first (leaving write_page synchronous), and then have a go at making write_page synchronous. NeilBrown ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-08-13 3:04 ` Neil Brown @ 2004-09-21 3:28 ` Paul Clements 2004-09-21 19:19 ` Paul Clements 0 siblings, 1 reply; 34+ messages in thread From: Paul Clements @ 2004-09-21 3:28 UTC (permalink / raw) To: Neil Brown; +Cc: jejb, linux-raid OK, here we go again. The patch has been updated to 2.6.9-rc2 and mdadm 1.7.0, and I've implemented the suggestions you've made below... Patches: http://parisc-linux.org/~jejb/md_bitmap/ Thanks, Paul ---- Neil Brown wrote: > This is definitely heading in the right direction. However I think > you have missed some of the subtleties of plugging. [snip] > That should then have the bitmap updates happening before the > data updates, and should cluster bitmap updates better. OK, I've fixed that. > The next step would be to stop write_page from being synchronous - I'm > not comfortable about ->unplug-fn having to wait for disk io. Well, it still does have to wait for the bitmap writes to complete before letting the regular writes go, but I see your point about the writes being submitted simultaneously (nearly) rather than being completely serialized. > This looks to be quite awkward as there is not call-back that happens > when writeback on a page is completed. You need to have a kernel > thread to wait for the pages to complete. I would get the > bitmap_daemon to do this. OK, this is what I've done...there didn't seem to be a better place to do it. > Does that make sense? Certainly does, and seems to work... ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-09-21 3:28 ` Paul Clements @ 2004-09-21 19:19 ` Paul Clements 2004-10-12 2:15 ` Neil Brown 0 siblings, 1 reply; 34+ messages in thread From: Paul Clements @ 2004-09-21 19:19 UTC (permalink / raw) To: Neil Brown; +Cc: jejb, linux-raid Paul Clements wrote: > OK, here we go again. The patch has been updated to 2.6.9-rc2 and mdadm > 1.7.0, and I've implemented the suggestions you've made below... > > Patches: http://parisc-linux.org/~jejb/md_bitmap/ Well, I ran some simple write performance benchmarks today and I was stunned to find the bitmap performance was terrible. After debugging a while, I realized the problem was having the bitmap daemon do the wait_on_page_writeback...if the daemon got caught at the wrong time, it could take quite a while to come back around and do the writeback. So, I added yet another thread, the bitmap writeback daemon (whose sole purpose is to wait for page writebacks) and now things are running excellently. My simple benchmarks, which previously showed about a 25-30% slowdown when using a bitmap vs. not using one, now show only a 4% slowdown, which is pretty close to the margin of error in the test itself. Check out the new patch here: http://parisc-linux.org/~jejb/md_bitmap/md_bitmap_2_37_2_6_9_rc2.diff Thanks again, Paul > Neil Brown wrote: > >> This is definitely heading in the right direction. However I think >> you have missed some of the subtleties of plugging. > > > [snip] > >> That should then have the bitmap updates happening before the >> data updates, and should cluster bitmap updates better. > > > OK, I've fixed that. > >> The next step would be to stop write_page from being synchronous - I'm >> not comfortable about ->unplug-fn having to wait for disk io. > > > Well, it still does have to wait for the bitmap writes to complete > before letting the regular writes go, but I see your point about the > writes being submitted simultaneously (nearly) rather than being > completely serialized. > >> This looks to be quite awkward as there is not call-back that happens >> when writeback on a page is completed. You need to have a kernel >> thread to wait for the pages to complete. I would get the >> bitmap_daemon to do this. > > > OK, this is what I've done...there didn't seem to be a better place to > do it. > >> Does that make sense? > > > Certainly does, and seems to work... > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-09-21 19:19 ` Paul Clements @ 2004-10-12 2:15 ` Neil Brown 2004-10-12 14:06 ` Paul Clements 2004-11-10 0:37 ` md: persistent (file-backed) bitmap Neil Brown 0 siblings, 2 replies; 34+ messages in thread From: Neil Brown @ 2004-10-12 2:15 UTC (permalink / raw) To: Paul Clements; +Cc: jejb, linux-raid On Tuesday September 21, paul.clements@steeleye.com wrote: > Paul Clements wrote: > > OK, here we go again. The patch has been updated to 2.6.9-rc2 and mdadm > > 1.7.0, and I've implemented the suggestions you've made below... > > > > Patches: http://parisc-linux.org/~jejb/md_bitmap/ > > Well, I ran some simple write performance benchmarks today and I was > stunned to find the bitmap performance was terrible. After debugging a > while, I realized the problem was having the bitmap daemon do the > wait_on_page_writeback...if the daemon got caught at the wrong time, it > could take quite a while to come back around and do the writeback. So, I > added yet another thread, the bitmap writeback daemon (whose sole > purpose is to wait for page writebacks) and now things are running > excellently. My simple benchmarks, which previously showed about a > 25-30% slowdown when using a bitmap vs. not using one, now show only a > 4% slowdown, which is pretty close to the margin of error in the test > itself. Check out the new patch here: > > http://parisc-linux.org/~jejb/md_bitmap/md_bitmap_2_37_2_6_9_rc2.diff > > Thanks again, > Paul > Further comments. bitmap_events 1/ You have inserted bitmap_event_hi/lo *before* recovery_cp, thus moving recovery_cp, and thus breaking backwards comparability. 2/ The test in hot_add_disk: + if (refsb && sb && uuid_equal(sb, refsb) && + sb->events_hi >= refsb->bitmap_events_hi && + sb->events_lo >= refsb->bitmap_events_lo) { + bitmap_invalidate = 0; is wrong. The events count must be compared as a 64bit number. e.g. it is only meaningful to compare events_lo if both events_hi are equal. pending_bio_list 1/ Do you really need a separate pending_bio_lock, or would the current device_lock be adequate to the task. 2/ I think there can be a race with new requests being added to this list while bitmap_unplug is running in unplug_slaves. I think you should "bio_get_list" before calling bitmap_unplug, So that you only then submit requests that were made definitely *before* the call the bitmap_unplug. This would have the added advantage that you don't need to keep claiming and dropping pending_bio_lock. random damage 1/ You have changed r1bio_pool_free without it being a useful change. 2/ You have removed "static" from sync_page_io for no apparent reason. 3/ You declare and set start_sector_nr, and never use it. I have removed the "random damage" bit and merged that patch into my collection. I would really appreciate it if any further changes could be sent as incremental patches, as reading through a large patch like that takes a fair bit of effort. I'll hopefully be finishing up some modification to mdadm this week. I'll aim to include support for bitmaps as part of that (based on your patch, but probably with a number of changes to match changes in mdadm). NeilBrown ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-10-12 2:15 ` Neil Brown @ 2004-10-12 14:06 ` Paul Clements 2004-10-12 21:16 ` Paul Clements 2004-11-10 0:37 ` md: persistent (file-backed) bitmap Neil Brown 1 sibling, 1 reply; 34+ messages in thread From: Paul Clements @ 2004-10-12 14:06 UTC (permalink / raw) To: Neil Brown; +Cc: jejb, linux-raid Neil Brown wrote: >>Paul Clements wrote: >>itself. Check out the new patch here: >> >>http://parisc-linux.org/~jejb/md_bitmap/md_bitmap_2_37_2_6_9_rc2.diff >> > Further comments. > > bitmap_events > 1/ You have inserted bitmap_event_hi/lo *before* recovery_cp, thus > moving recovery_cp, and thus breaking backwards comparability. Yes. I guess when recovery_cp came along I failed to notice that... > 2/ The test in hot_add_disk: > + if (refsb && sb && uuid_equal(sb, refsb) && > + sb->events_hi >= refsb->bitmap_events_hi && > + sb->events_lo >= refsb->bitmap_events_lo) { > + bitmap_invalidate = 0; > is wrong. The events count must be compared as a 64bit > number. e.g. it is only meaningful to compare events_lo if both > events_hi are equal. Yes, that is broken. > pending_bio_list > 1/ Do you really need a separate pending_bio_lock, or would > the current device_lock be adequate to the task. Probably so...especially with the following change... > 2/ I think there can be a race with new requests being added to > this list while bitmap_unplug is running in unplug_slaves. > I think you should "bio_get_list" before calling bitmap_unplug, > So that you only then submit requests that were made definitely > *before* the call the bitmap_unplug. This would have the added > advantage that you don't need to keep claiming and dropping > pending_bio_lock. Yes, that would make sense. > collection. I would really appreciate it if any further changes could > be sent as incremental patches, as reading through a large patch like > that takes a fair bit of effort. Yes, I certainly understand that. I appreciate all the effort you've put into reviewing and giving suggestions. Further patches will be on top of the current bitmap patch. I will send out an incremental patch to fix the issues above, probably in a couple of days. Thanks, Paul ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes 2004-10-12 14:06 ` Paul Clements @ 2004-10-12 21:16 ` Paul Clements 0 siblings, 0 replies; 34+ messages in thread From: Paul Clements @ 2004-10-12 21:16 UTC (permalink / raw) To: Neil Brown; +Cc: jejb, linux-raid [-- Attachment #1: Type: text/plain, Size: 1734 bytes --] Neil, patch to fix the issues mentioned below has been tested and is attached. Should apply on top of 2.6.9-rc2 + md_bitmap. -- Paul Paul Clements wrote: > Neil Brown wrote: > >>> Paul Clements wrote: > > >>> itself. Check out the new patch here: >>> >>> http://parisc-linux.org/~jejb/md_bitmap/md_bitmap_2_37_2_6_9_rc2.diff >>> > >> Further comments. >> >> bitmap_events >> 1/ You have inserted bitmap_event_hi/lo *before* recovery_cp, thus >> moving recovery_cp, and thus breaking backwards comparability. > > > Yes. I guess when recovery_cp came along I failed to notice that... > >> 2/ The test in hot_add_disk: >> + if (refsb && sb && uuid_equal(sb, refsb) && >> + sb->events_hi >= refsb->bitmap_events_hi && >> + sb->events_lo >= refsb->bitmap_events_lo) { >> + bitmap_invalidate = 0; >> is wrong. The events count must be compared as a 64bit >> number. e.g. it is only meaningful to compare events_lo if both >> events_hi are equal. > > > Yes, that is broken. > >> pending_bio_list >> 1/ Do you really need a separate pending_bio_lock, or would >> the current device_lock be adequate to the task. > > > Probably so...especially with the following change... > >> 2/ I think there can be a race with new requests being added to >> this list while bitmap_unplug is running in unplug_slaves. >> I think you should "bio_get_list" before calling bitmap_unplug, >> So that you only then submit requests that were made definitely >> *before* the call the bitmap_unplug. This would have the added >> advantage that you don't need to keep claiming and dropping >> pending_bio_lock. > > > Yes, that would make sense. [-- Attachment #2: md_bitmap_bugfix_2_37_2_6_9_rc2.diff --] [-- Type: text/plain, Size: 5644 bytes --] diff -purN --exclude-from /export/public/clemep/tmp/dontdiff linux-2.6.9-rc2-BITMAP/drivers/md/md.c linux-2.6.9-rc2-BITMAP-NEW/drivers/md/md.c --- linux-2.6.9-rc2-BITMAP/drivers/md/md.c Tue Sep 14 14:11:07 2004 +++ linux-2.6.9-rc2-BITMAP-NEW/drivers/md/md.c Tue Oct 12 12:23:13 2004 @@ -2375,8 +2375,9 @@ static int hot_add_disk(mddev_t * mddev, if (rdev->sb_loaded) sb = (mdp_super_t *)page_address(rdev->sb_page); if (refsb && sb && uuid_equal(sb, refsb) && - sb->events_hi >= refsb->bitmap_events_hi && - sb->events_lo >= refsb->bitmap_events_lo) { + (sb->events_hi > refsb->bitmap_events_hi || + (sb->events_hi == refsb->bitmap_events_hi && + sb->events_lo >= refsb->bitmap_events_lo))) { bitmap_invalidate = 0; } else if (!mddev->persistent) { /* assume bitmap is valid */ bitmap_invalidate = 0; diff -purN --exclude-from /export/public/clemep/tmp/dontdiff linux-2.6.9-rc2-BITMAP/drivers/md/raid1.c linux-2.6.9-rc2-BITMAP-NEW/drivers/md/raid1.c --- linux-2.6.9-rc2-BITMAP/drivers/md/raid1.c Tue Sep 14 14:13:47 2004 +++ linux-2.6.9-rc2-BITMAP-NEW/drivers/md/raid1.c Tue Oct 12 16:23:54 2004 @@ -455,18 +455,21 @@ static void unplug_slaves(mddev_t *mddev struct bio *bio; unsigned long flags; + /* pull writes off the pending queue and (later) submit them */ + spin_lock_irqsave(&conf->device_lock, flags); + bio = bio_list_get(&conf->pending_bio_list); + spin_unlock_irqrestore(&conf->device_lock, flags); + /* flush any pending bitmap writes to disk before proceeding w/ I/O */ if (bitmap_unplug(mddev->bitmap) != 0) printk("%s: bitmap file write failed!\n", mdname(mddev)); - /* pull writes off the pending queue and submit them */ - spin_lock_irqsave(&conf->pending_bio_lock, flags); - while ((bio = bio_list_pop(&conf->pending_bio_list))) { - spin_unlock_irqrestore(&conf->pending_bio_lock, flags); + while (bio) { /* submit pending writes */ + struct bio *next = bio->bi_next; + bio->bi_next = NULL; generic_make_request(bio); - spin_lock_irqsave(&conf->pending_bio_lock, flags); + bio = next; } - spin_unlock_irqrestore(&conf->pending_bio_lock, flags); spin_lock_irqsave(&conf->device_lock, flags); for (i=0; i<mddev->raid_disks; i++) { @@ -666,9 +669,9 @@ static int make_request(request_queue_t atomic_inc(&r1_bio->remaining); /* queue the write...it will be submitted when we unplug */ - spin_lock_irqsave(&conf->pending_bio_lock, flags); + spin_lock_irqsave(&conf->device_lock, flags); bio_list_add(&conf->pending_bio_list, mbio); - spin_unlock_irqrestore(&conf->pending_bio_lock, flags); + spin_unlock_irqrestore(&conf->device_lock, flags); } if (atomic_dec_and_test(&r1_bio->remaining)) { @@ -965,9 +968,9 @@ static void sync_request_write(mddev_t * "while resyncing!\n", mdname(mddev), err); /* queue the write...it will be submitted when we unplug */ - spin_lock_irqsave(&conf->pending_bio_lock, flags); + spin_lock_irqsave(&conf->device_lock, flags); bio_list_add(&conf->pending_bio_list, wbio); - spin_unlock_irqrestore(&conf->pending_bio_lock, flags); + spin_unlock_irqrestore(&conf->device_lock, flags); } if (atomic_dec_and_test(&r1_bio->remaining)) { @@ -1307,7 +1310,6 @@ static int run(mddev_t *mddev) init_waitqueue_head(&conf->wait_idle); init_waitqueue_head(&conf->wait_resume); - conf->pending_bio_lock = SPIN_LOCK_UNLOCKED; bio_list_init(&conf->pending_bio_list); if (!conf->working_disks) { diff -purN --exclude-from /export/public/clemep/tmp/dontdiff linux-2.6.9-rc2-BITMAP/include/linux/raid/md_p.h linux-2.6.9-rc2-BITMAP-NEW/include/linux/raid/md_p.h --- linux-2.6.9-rc2-BITMAP/include/linux/raid/md_p.h Tue Sep 14 12:12:55 2004 +++ linux-2.6.9-rc2-BITMAP-NEW/include/linux/raid/md_p.h Tue Oct 12 12:21:12 2004 @@ -133,17 +133,20 @@ typedef struct mdp_superblock_s { __u32 events_lo; /* 8 low-order of superblock update count */ __u32 cp_events_hi; /* 9 high-order of checkpoint update count */ __u32 cp_events_lo; /* 10 low-order of checkpoint update count */ - __u32 bitmap_events_hi; /* 11 high-order of bitmap update count */ - __u32 bitmap_events_lo; /* 12 low-order of bitmap update count */ #else __u32 events_lo; /* 7 low-order of superblock update count */ __u32 events_hi; /* 8 high-order of superblock update count */ __u32 cp_events_lo; /* 9 low-order of checkpoint update count */ __u32 cp_events_hi; /* 10 high-order of checkpoint update count */ - __u32 bitmap_events_lo; /* 11 low-order of bitmap update count */ +#endif + __u32 recovery_cp; /* 11 recovery checkpoint sector count */ +#ifdef __BIG_ENDIAN __u32 bitmap_events_hi; /* 12 high-order of bitmap update count */ + __u32 bitmap_events_lo; /* 13 low-order of bitmap update count */ +#else + __u32 bitmap_events_lo; /* 12 low-order of bitmap update count */ + __u32 bitmap_events_hi; /* 13 high-order of bitmap update count */ #endif - __u32 recovery_cp; /* 13 recovery checkpoint sector count */ __u32 gstate_sreserved[MD_SB_GENERIC_STATE_WORDS - 14]; /* diff -purN --exclude-from /export/public/clemep/tmp/dontdiff linux-2.6.9-rc2-BITMAP/include/linux/raid/raid1.h linux-2.6.9-rc2-BITMAP-NEW/include/linux/raid/raid1.h --- linux-2.6.9-rc2-BITMAP/include/linux/raid/raid1.h Tue Sep 14 14:09:11 2004 +++ linux-2.6.9-rc2-BITMAP-NEW/include/linux/raid/raid1.h Tue Oct 12 12:30:03 2004 @@ -37,7 +37,6 @@ struct r1_private_data_s { /* queue pending writes and submit them on unplug */ struct bio_list pending_bio_list; - spinlock_t pending_bio_lock; /* for use when syncing mirrors: */ ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: md: persistent (file-backed) bitmap 2004-10-12 2:15 ` Neil Brown 2004-10-12 14:06 ` Paul Clements @ 2004-11-10 0:37 ` Neil Brown 2004-11-10 18:28 ` Paul Clements 1 sibling, 1 reply; 34+ messages in thread From: Neil Brown @ 2004-11-10 0:37 UTC (permalink / raw) To: Paul Clements, jejb, linux-raid Hi Paul, I've been looking through the bitmap code again and found the handling of resync a bit hard to wrap my brain around. The handling of the counter here seems a bit non-obvious and possibly fragile. I would like to make a change, but thought I should bounce it off you first. I would like to change the "RESYNC_MASK" bit to mean: At least one block in this chunk is out-of-sync. Then: - When we read the bitmap from disk we set this bit and the SHADOW_MASK, but leave the counter at zero. - When we get a failed write, we set this bit, but still decrement the counter. - When we are performing a resync, we periodically clear the bit on recently completed chunks. - We only clear the SHADOW_MASK and on-disk bit when the counter hits zero *and* this bit is clear. I would find this approach a lot easier to understand. Are you OK with it? Also, I would like to move the bitmap_testbit in md_do_sync down into the personality. This should make life easier for other personalities like raid10 which use a very different approach for resync than for recovery. NeilBrown ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: md: persistent (file-backed) bitmap 2004-11-10 0:37 ` md: persistent (file-backed) bitmap Neil Brown @ 2004-11-10 18:28 ` Paul Clements 0 siblings, 0 replies; 34+ messages in thread From: Paul Clements @ 2004-11-10 18:28 UTC (permalink / raw) To: Neil Brown; +Cc: jejb, linux-raid Hi Neil, Neil Brown wrote: > I would like to change the "RESYNC_MASK" bit to mean: > At least one block in this chunk is out-of-sync. OK > Then: > - When we read the bitmap from disk we set this bit and the > SHADOW_MASK, but leave the counter at zero. > - When we get a failed write, we set this bit, but still decrement > the counter. > - When we are performing a resync, we periodically clear the bit on > recently completed chunks. I assume that this also means that the counter will get incremented before each read-for-resync is submitted. Perhaps you've already considered that? > - We only clear the SHADOW_MASK and on-disk bit when the counter > hits zero *and* this bit is clear. > > I would find this approach a lot easier to understand. Are you OK > with it? It sounds reasonable. Probably a little simpler too... > Also, I would like to move the bitmap_testbit in md_do_sync down into > the personality. This should make life easier for other > personalities like raid10 which use a very different approach for > resync than for recovery. OK, that sounds fine. -- Paul ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2004-11-10 18:28 UTC | newest]
Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-01-29 22:51 [ANNOUNCE][PATCH 2.6] md: persistent (file-backed) bitmap and async writes Paul Clements
2004-01-30 22:52 ` Paul Clements
2004-02-09 2:51 ` Neil Brown
2004-02-09 19:45 ` Paul Clements
2004-02-10 0:04 ` Neil Brown
2004-02-10 16:20 ` Paul Clements
2004-02-10 16:57 ` Paul Clements
2004-02-13 20:58 ` Paul Clements
2004-03-05 5:06 ` Neil Brown
2004-03-05 22:05 ` Paul Clements
2004-03-31 18:38 ` Paul Clements
2004-04-28 18:10 ` Paul Clements
2004-04-28 18:53 ` Peter T. Breuer
2004-04-29 8:41 ` Neil Brown
2004-05-04 20:08 ` Paul Clements
2004-06-08 20:53 ` Paul Clements
2004-06-08 22:47 ` Neil Brown
2004-06-14 23:39 ` Neil Brown
2004-06-14 23:59 ` James Bottomley
2004-06-15 6:27 ` Neil Brown
2004-06-17 17:57 ` Paul Clements
2004-06-18 20:48 ` Paul Clements
2004-06-23 21:48 ` Paul Clements
2004-06-23 21:50 ` Paul Clements
2004-07-06 14:52 ` Paul Clements
[not found] ` <40F7E50F.2040308@steeleye.com>
[not found] ` <16649.61212.310271.36561@cse.unsw.edu.au>
2004-08-10 21:37 ` Paul Clements
2004-08-13 3:04 ` Neil Brown
2004-09-21 3:28 ` Paul Clements
2004-09-21 19:19 ` Paul Clements
2004-10-12 2:15 ` Neil Brown
2004-10-12 14:06 ` Paul Clements
2004-10-12 21:16 ` Paul Clements
2004-11-10 0:37 ` md: persistent (file-backed) bitmap Neil Brown
2004-11-10 18:28 ` Paul Clements
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).