From: NeilBrown <neilb@suse.de>
To: Alexander Lyakas <alex.bolshoy@gmail.com>
Cc: Aapo Laine <aapo.laine@shiftmail.org>,
linux-raid <linux-raid@vger.kernel.org>
Subject: Re: potentially lost largeish raid5 array..
Date: Mon, 7 Nov 2011 08:58:39 +1100 [thread overview]
Message-ID: <20111107085839.1002aaeb@notabene.brown> (raw)
In-Reply-To: <CAGRgLy4StFZwkm1VeC36ZV=FEPXo3k_85b5MB==Df2QRp8nJ+Q@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 12867 bytes --]
On Sat, 5 Nov 2011 14:17:50 +0200 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:
> Hi Neil,
> I wanted to ask how the mentoring you're willing to offer would work,
> on your opinion?
Primarily you asking questions and me answering them as helpfully as I can.
Also you proposing patches and me giving useful review, and/or queuing them
for upstream.
It could also involve me suggesting things you could work on, but I think
people are more motivated to work on things they have chosen themselves, so I
don't expect that would be a large part of it.
> I had much fun developing for that *other* operating system kernel,
> and I am looking forward to do things for Linux kernel too some day.
Great!
>
> My pain at present, is the lack of kernel printouts during various
> config sequences. I have been trying to follow, for example, the
> ADD_NEW_DISK sequence and the slot_store() sequence, to understand how
> it detects whether a full-sync or not will be required. And there are
> many possibilities before you figure out the right "if"
> (non-persistent arrays, containers, external metadata....).
>
> I am used to heavily print-around all config (non-IO-path) sequences
> (such as ADD_NEW_DISK). This also helps if a particular step in the
> sequence takes a long time, due to a slow IO or something like that.
> What will be the pros/cons of doing that in md on your opinion?
I think you are suggesting adding more pr_debug() calls in md.c. Is that
correct? I don't object to that - they are fairly cheap and can easily be
compiled out completely for those who don't want them. So feel free to send
me a patch adding some pr_debug's that you find useful, and I'll let you know
what I think of them.
NeilBrown
>
> Thanks!
> Alex.
>
>
>
> On Sun, Sep 25, 2011 at 12:10 PM, NeilBrown <neilb@suse.de> wrote:
> > On Sat, 24 Sep 2011 23:57:58 +0200 Aapo Laine <aapo.laine@shiftmail.org>
> > wrote:
> >
> >> On 09/23/11 11:15, NeilBrown wrote:
> >> > On Fri, 23 Sep 2011 02:09:36 -0600 Thomas Fjellstrom<thomas@fjellstrom.ca>
> >> > wrote:
> >> >> I forgot to say, but: Thank you very much :) for the help, and your tireless
> >> >> work on md.
> >> >>
> >> >
> >> > You've very welcome .... but I felt I needed to respond to that word
> >> > "tireless".
> >> > The truth is that I am getting rather tired of md .... if anyone knows anyone
> >> > who wants to get into kernel development and is wondering where to start -
> >> > please consider whispering 'the md driver' in their ear. Plenty to do, great
> >> > mentoring possibilities, and competent linux kernel engineers with good
> >> > experience are unlikely to have much trouble finding a job ;-)
> >> >
> >> > NeilBrown
> >>
> >> Whoa this is shocking news!
> >
> > Hopefully not too shocking... I'm not planning on leaving md any time soon.
> > I do still enjoy working on it.
> > But it certainly isn't as fresh and new as it was 10 years again. It would
> > probably do both me and md a lot of good to have someone with new enthusiasm
> > and new perspectives...
> >
> >
> >>
> >> Firstly then, thank you so much for your excellent work up to now. Linux
> >> has what I believe to be the best software raid of all operating systems
> >> thanks to you. Excellent in both features, and reliability i.e. quality
> >> of code.
> >>
> >> And also the support through the list was great. I found so many
> >> problems solved already just by looking at the archives... so many
> >> people with their arse saved by you.
> >>
> >> I think everybody here is sorry to see you willing to go.
> >>
> >> Now the bad news... Regarding the MD takeover, there I think I see a
> >> problem...
> >> The MD code is very tersely commented, compared to its complexity!
> >
> > That is certainly true, but seems to be true across much of the kernel, and
> > probably most code in general (though I'm sure such a comment will lead to
> > people wanting to tell me their favourite exceptions ... so I'll start with
> > "TeX").
> >
> > This is one of the reasons I offered "mentoring" to any likely candidate.
> >
> >
> >
> >>
> >> - there is not much explanation of overall strategies, or the structure
> >> of code. Also the flow of data between the functions is not much
> >> explained most of the times.
> >>
> >> - It's not obvious to me what is the entry point for kernel processes
> >> related to MD arrays, how are they triggered and where do they run...
> >> E.g. in the past I tried to understand how did resync work, but I
> >> couldn't. I thought there was a kernel process controlling resync
> >> advancement, but I couldn't really find the boundaries of code inside
> >> which it was executing.
> >
> > md_do_sync() is the heart of the resync process. it calls into the
> > personality's sync_request() function.
> >
> > The kernel thread is started by md_check_recovery() if it appears to be
> > needed. md_check_recovery() is regularly run by each personality's main
> > controlling thread.
> >
> >>
> >> - it's not clear what the various functions do or in what occasion they
> >> are called. Except from their own name, most of them have no comments
> >> just before the definition.
> >
> > How about this:
> > - you identify some functions for which the purpose or use isn't clear
> > - I'll explain to you when/how/why they are used
> > - You create a patch which adds comments which explains it all
> > - I'll apply that patch.
> >
> > deal??
> >
> >>
> >> - the algoritms within the functions are very long and complex, but only
> >> rarely they are explained by comments. I am now seeing pieces having 5
> >> levels of if's nested one inside the other, and there are almost no
> >> comments.
> >
> > I feel your pain. I really should factor out the deeply nested levels into
> > separate functions. Sometimes I have done that but there is plenty more do
> > to. Again, I would be much more motivated to do this if I were working with
> > someone who would be directly helped by it. So if you identify specific
> > problems, it'll be a lot easier for me to help fix them.
> >
> >
> >>
> >> - last but not least, variables have very short names, and for most of
> >> them, it is not explained what they mean. This is mostly for local
> >> variables, but sometimes even for the structs which go into metadata
> >> e.g. in struct r1_private_data_s most members do not have an
> >> explanation. This is pretty serious, to me at least, for understanding
> >> the code.
> >
> > Does this help?
> >
> > diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
> > index e0d676b..feb44ad 100644
> > --- a/drivers/md/raid1.h
> > +++ b/drivers/md/raid1.h
> > @@ -28,42 +28,67 @@ struct r1_private_data_s {
> > mddev_t *mddev;
> > mirror_info_t *mirrors;
> > int raid_disks;
> > +
> > + /* When choose the best device for a read (read_balance())
> > + * we try to keep sequential reads one the same device
> > + * using 'last_used' and 'next_seq_sect'
> > + */
> > int last_used;
> > sector_t next_seq_sect;
> > + /* During resync, read_balancing is only allowed on the part
> > + * of the array that has been resynced. 'next_resync' tells us
> > + * where that is.
> > + */
> > + sector_t next_resync;
> > +
> > spinlock_t device_lock;
> >
> > + /* list of 'r1bio_t' that need to be processed by raid1d, whether
> > + * to retry a read, writeout a resync or recovery block, or
> > + * anything else.
> > + */
> > struct list_head retry_list;
> > - /* queue pending writes and submit them on unplug */
> > - struct bio_list pending_bio_list;
> >
> > - /* for use when syncing mirrors: */
> > + /* queue pending writes to be submitted on unplug */
> > + struct bio_list pending_bio_list;
> >
> > + /* for use when syncing mirrors:
> > + * We don't allow both normal IO and resync/recovery IO at
> > + * the same time - resync/recovery can only happen when there
> > + * is no other IO. So when either is active, the other has to wait.
> > + * See more details description in raid1.c near raise_barrier().
> > + */
> > + wait_queue_head_t wait_barrier;
> > spinlock_t resync_lock;
> > int nr_pending;
> > int nr_waiting;
> > int nr_queued;
> > int barrier;
> > - sector_t next_resync;
> > - int fullsync; /* set to 1 if a full sync is needed,
> > - * (fresh device added).
> > - * Cleared when a sync completes.
> > - */
> > - int recovery_disabled; /* when the same as
> > - * mddev->recovery_disabled
> > - * we don't allow recovery
> > - * to be attempted as we
> > - * expect a read error
> > - */
> >
> > - wait_queue_head_t wait_barrier;
> > + /* Set to 1 if a full sync is needed, (fresh device added).
> > + * Cleared when a sync completes.
> > + */
> > + int fullsync
> >
> > - struct pool_info *poolinfo;
> > + /* When the same as mddev->recovery_disabled we don't allow
> > + * recovery to be attempted as we expect a read error.
> > + */
> > + int recovery_disabled;
> >
> > - struct page *tmppage;
> >
> > + /* poolinfo contains information about the content of the
> > + * mempools - it changes when the array grows or shrinks
> > + */
> > + struct pool_info *poolinfo;
> > mempool_t *r1bio_pool;
> > mempool_t *r1buf_pool;
> >
> > + /* temporary buffer to synchronous IO when attempting to repair
> > + * a read error.
> > + */
> > + struct page *tmppage;
> > +
> > +
> > /* When taking over an array from a different personality, we store
> > * the new thread here until we fully activate the array.
> > */
> >
> >>
> >> - ... maybe more I can't think of right now ...
> >>
> >> Your code is of excellent quality, as I wrote, I wish there were more
> >> programmers like you, but if you now want to leave, THEN I start to be
> >> worried! Would you please comment it (much) more before leaving? Fully
> >> understanding your code I think is going to take other people a lot of
> >> time otherwise, and you might not find a replacement easily and/or s/he
> >> might do mistakes.
> >
> > I'm not planning on leaving - not for quite some time anyway.
> > But I know the code so well that it is hard to see which bits need
> > documenting, and what sort of documentation would really help.
> > I would love it if you (or anyone) would review the code and point to parts
> > that particularly need improvement.
> >
> >
> >>
> >> There were times in the past when I had ideas and I wanted to contribute
> >> code, but when I looked inside MD and tried to understand where should I
> >> put my changes, I realized I wasn't able to understand what current code
> >> was doing. Maybe I am not a good enough C programmer, but I was able to
> >> change things in other occasions.
> >
> > Don't be afraid to ask... But sometimes you do need a bit of persistence
> > though. :-) Not always easy to find time for that.
> >
> >
> >>
> >>
> >> I hope you won't get these critiques bad...
> >
> > Not at all.
> >
> >> and thanks for all your efforts, really, in the name of, I think, everybody.
> >> Aapo L.
> >
> > Thanks for your valuable feedback.
> > Being able to see problems is of significant value. One of the reasons that
> > I pay close attention to this list is because it shows me where the problems
> > with md and mdadm are. People often try things that I would never even dream
> > of trying (because I know they won't work). See this helps me know where the
> > code and be improved - either so what they try does work, or so it fails more
> > gracefully and helpfully.
> >
> > Thanks,
> > NeilBrown
> >
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
prev parent reply other threads:[~2011-11-06 21:58 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-09-23 1:50 potentially lost largeish raid5 array Thomas Fjellstrom
2011-09-23 4:32 ` NeilBrown
2011-09-23 4:49 ` Thomas Fjellstrom
2011-09-23 4:58 ` Roman Mamedov
2011-09-23 5:10 ` Thomas Fjellstrom
2011-09-23 7:06 ` David Brown
2011-09-23 7:37 ` Thomas Fjellstrom
2011-09-23 12:56 ` Stan Hoeppner
2011-09-23 13:28 ` David Brown
2011-09-23 16:22 ` Thomas Fjellstrom
2011-09-23 23:24 ` Stan Hoeppner
2011-09-24 0:11 ` Thomas Fjellstrom
2011-09-24 12:17 ` Stan Hoeppner
2011-09-24 13:11 ` (unknown) Tomáš Dulík
2011-09-24 15:16 ` potentially lost largeish raid5 array David Brown
2011-09-24 16:38 ` Stan Hoeppner
2011-09-25 13:03 ` David Brown
2011-09-25 14:39 ` Stan Hoeppner
2011-09-25 15:18 ` David Brown
2011-09-25 23:58 ` Stan Hoeppner
2011-09-26 10:51 ` David Brown
2011-09-26 19:52 ` Stan Hoeppner
2011-09-26 20:29 ` David Brown
2011-09-26 23:28 ` Krzysztof Adamski
2011-09-27 3:53 ` Stan Hoeppner
2011-09-24 17:48 ` Thomas Fjellstrom
2011-09-24 5:59 ` Mikael Abrahamsson
2011-09-24 17:53 ` Thomas Fjellstrom
2011-09-25 18:07 ` Robert L Mathews
2011-09-26 6:08 ` Mikael Abrahamsson
2011-09-26 2:26 ` Krzysztof Adamski
2011-09-23 5:11 ` NeilBrown
2011-09-23 5:22 ` Thomas Fjellstrom
2011-09-23 8:09 ` Thomas Fjellstrom
2011-09-23 9:15 ` NeilBrown
2011-09-23 16:26 ` Thomas Fjellstrom
2011-09-25 9:37 ` NeilBrown
2011-09-24 21:57 ` Aapo Laine
2011-09-25 9:18 ` Kristleifur Daðason
2011-09-25 10:10 ` NeilBrown
2011-10-01 23:21 ` Aapo Laine
2011-10-02 17:00 ` Aapo Laine
2011-10-05 2:13 ` NeilBrown
2011-10-05 2:06 ` NeilBrown
2011-11-05 12:17 ` Alexander Lyakas
2011-11-06 21:58 ` NeilBrown [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111107085839.1002aaeb@notabene.brown \
--to=neilb@suse.de \
--cc=aapo.laine@shiftmail.org \
--cc=alex.bolshoy@gmail.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).