linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: raz ben yehuda <raziebe@gmail.com>
To: Neil Brown <neilb@suse.de>
Cc: linux raid <linux-raid@vger.kernel.org>
Subject: Re: Subject: [PATCH 006/009]: raid1: chunk size check in run
Date: Thu, 21 May 2009 16:32:12 +0300	[thread overview]
Message-ID: <1242912733.3598.42.camel@raz> (raw)
In-Reply-To: <18964.50795.747389.606277@notabene.brown>


On Thu, 2009-05-21 at 13:11 +1000, Neil Brown wrote:
> On Wednesday May 20, raziebe@gmail.com wrote:
> > Neil
> > First I thank you for your effort. Now I can work in full steam on the
> > reshape on top of the new raid0 code. Currently this is what I have in
> > mind.If you have any design suggestions I would be happy to hear before
> > the coding.
> > 
> >    I added : raid0_add_hot that:
> > 	1. checks if the new disk size if smaller than the raid chunk size. if
> > so , reject.
> > 	2. check if new the disk size max_hw_sectors is smaller than the
> > raid's. if so generate a warning but do not reject.   
> >  	3. adds a disk to raid0 disk list. and turns off its in_sync bit.
> 
> I don't think the 'in_sync' bit is used in raid0 currently, so that
> bit seems irrelevant, but shouldn't hurt.
> > 
> > I will add raid0_check_reshape 
> >       This procedure prepares the raid for the reshape process.
> > 	1. Creates a temporary mddev with the same disks as the raid's and with
> > the new disks. This raid acts as a mere mappings so i will be able to
> > map sectors to the new target raid in the reshape process. This means i
> > have to work in create_strip_zones raid0_run ( separate patch ).
> >         2. Sets the target raid transfer size.
> > 	3. Create an allocation scheme for reshape bio allocation. i reshape in
> > chunk size. 
> > 	4. create raid0_reshape thread for writes.
> > 	5. wake up raid0_sync thread. 
> 
> Do you really need a temporary mddev, or just a temporary 'conf'??
I need mddev because i want to use map_sector and create_strip. I
certainly can fix map_sector and create_strip to work with conf and not
mddev, though it will make create_strip quite cumbersome. 
I will split create_strip to several independent functions. 
Do you agree ? 
> Having to create the raid0_reshape thread just for writes is a bit
> unfortunate, but it probably is the easiest approach.  You might be
> able to get the raid0_sync thread to do them, but that would be messy
> I expect.
I will start with the easy approach, meaning , a different thread for the writes.
Once i am done , i will see how can merge the reads and writes to work
in md_sync.
> > 
> > I will add raid0_sync: raid0_sync acts as the reshape read size process.
> > 
> >     1. Allocates a read bio.	
> >     2. Map_bio target with find_zone and map_sector, both map_sector and
> > find_zone are using the old raid mappings.
> >     3. Deactivate the raid.
> >     3. Lock and wait for the raid to be emptied from any previous IOs.
> >     4. Generate a read request.
> >     5. Release the lock. 
> 
> I think that sounds correct.
> 
> > 
> > I will add reshape_read_endio: 
> > 	if IO is successful then:
> > 		add the bio to reshape_list
> > 	else
> > 		add the bio to a retry list ( how many retries .. ?)
> 
> zero retries.  The underlying block device has done all the retries
> that are appropriate.  If you get a read error, then that block is
> gone.  Probably the best you can do is write garbage to the
> destination and report the error.
> 
> > 
> > I will add raid0_reshape: 
> > 	raid0_reshape is a md_thread that polls on the reshape_list and
> > commences writes based on the reads.
> > 	1. Grub a bio from reshape list.
> > 	2. map sector and find zone on the new raid mappings. 
> > 	3. set bio direction to write.
> > 	4. generate a write.
> > 	
> > 	if bio is in retry_list retry the bio.
> > 	if bio is in active_io list do the bio.
> > 	
> > I will add a reshape_write_endio that just frees the bio and his pages.
> 
> OK (except for the retry).
> 
> > 
> > raid0_make_request
> > 	I will add a check and see if the raid is in reshape. 
> > 	if so then
> > 		if IO is in the new mappings area we generate the IO
> > 				from the new mappings.
> > 		if IO is in the old mappings then we generate the IO
> > 				from the old mappings ( race here .. no ?)
> > 		if IO is in the current reshape active area, we push the io to a
> > active_io list that will processed by raid0_reshape.
> 
> This doesn't seem to match what you say above.
> If you don't submit a read for 'reshape' until all IO has drained,  
> then presumably you would just block any incoming IO until the current
> reshape requests have all finished.  i.e. you only ever have IO or
> reshape, but not both.
Where did i said that ? guess i wasn't clear.
> Alternately  you could have a sliding window covering there area
> that is currently being reshaped.
> If an IO comes in for that area, you need to either
>   - close the window and perform the IO, or
>   - wait for the window to slide past.
> I would favour the latter.  But queueing the IO for raid0_reshape doesn't
> really gain you anything I think.
I wasn't clear enough.A "current reshape active area" is my sliding window. 
I wait for the window to slide past. this is exactly what i had in mind.so yes
this is what am writing, a reshape_window.
> Issues that you haven't mentioned:
>   - metadata update: you need to record progress in the metadata
>     as the window slides along, in case of an unclean restart
I thought md does that for me. So it doesn't. Am i to call
md_allow_write ( that calls md_update_sbs ) ? how frequent ?
>   - Unless you only schedule one chunk at a time (which would be slow
>     things down I expect), you need to ensure that you don't schedule
>     a write to block for which the read hasn't completed yet.
ah... yes. i thought of it but forgot. my question is how ? should i
simply use an interruptable sleep ?  what do you do in raid5 ?
>     This is particularly an issues if you support changing the 
>     chunk size.
>   - I assume you are (currently) only supporting a reshape that
>     increases the size of the array and the number of devices?
neither changing the chunk size nor shrink is in my spec. so no. 
Maybe when i finish my studies ( you can google for "offsched + raz" and follow the link ... )
then i will have some raid quality time. 
> NeilBrown


  reply	other threads:[~2009-05-21 13:32 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-19 16:04 Subject: [PATCH 006/009]: raid1: chunk size check in run raz ben yehuda
2009-05-20  1:45 ` Neil Brown
2009-05-20 13:50   ` raz ben yehuda
2009-05-21  3:11     ` Neil Brown
2009-05-21 13:32       ` raz ben yehuda [this message]
2009-05-21 11:33         ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1242912733.3598.42.camel@raz \
    --to=raziebe@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).