linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* raid5: direct IO and md_wakeup_thread
@ 2014-09-05 17:52 Markus Stockhausen
  2014-09-08  6:46 ` NeilBrown
  0 siblings, 1 reply; 4+ messages in thread
From: Markus Stockhausen @ 2014-09-05 17:52 UTC (permalink / raw)
  To: neilb@suse.de; +Cc: linux-raid@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 2421 bytes --]

Hi Neil,

I hope you are doing better after the flu. Maybe you find some time 
to explain the following codings in raid5.c to me. At least one of
the patchs seems to be signed off by you. The RAID5 I'm using has 
default settings.

1) Generally speaking of md_wakup_thread() calls. Does it make
any sense to call them the from within raid5d()? Or when being 
implemented at locations that might be called from the 
make_request() path too - for simplicity no further caller destinction 
was programmed.
 
1) In case of a single direct I/O writer to the /dev/md0 device the 
following coding will always fire the md_wakeup_thread(). That
is one reason for alternating raid5d loops one with data to 
process and the next without. Is that corner case wanted?

static void handle_stripe(struct stripe_head *sh)
        ...
        ops_run_io(sh, &s);

        if (s.dec_preread_active) {
                /* We delay this until after ops_run_io so that if make_request
                 * is waiting on a flush, it won't continue until the writes
                 * have actually been submitted.
                 */
                atomic_dec(&conf->preread_active_stripes);
                if (atomic_read(&conf->preread_active_stripes) <
                    IO_THRESHOLD)
                        md_wakeup_thread(conf->mddev->thread); */
        }

2) The wakeup_thread() in the following preread path seems
to be unnecesary or will a double call wake the thread twice?

void do_release_stripe(struct r5conf *conf, struct stripe_head *sh,
                              struct list_head *temp_inactive_list, int mstwake)
{
        BUG_ON(!list_empty(&sh->lru));
        BUG_ON(atomic_read(&conf->active_stripes)==0);
        if (test_bit(STRIPE_HANDLE, &sh->state)) {
                if (test_bit(STRIPE_DELAYED, &sh->state) &&
                    !test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) {
                        list_add_tail(&sh->lru, &conf->delayed_list);
                        if (atomic_read(&conf->preread_active_stripes)
                            < IO_THRESHOLD)
                                md_wakeup_thread(conf->mddev->thread);
                } else if (test_bit(STRIPE_BIT_DELAY, &sh->state) &&
                        ...
                }
                md_wakeup_thread(conf->mddev->thread);
                ...

Thanks in advance.

Markus=

[-- Attachment #2: InterScan_Disclaimer.txt --]
[-- Type: text/plain, Size: 1650 bytes --]

****************************************************************************
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.

Über das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

Vorstand:
Kadir Akin
Dr. Michael Höhnerbach

Vorsitzender des Aufsichtsrates:
Hans Kristian Langva

Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

executive board:
Kadir Akin
Dr. Michael Höhnerbach

President of the supervisory board:
Hans Kristian Langva

Registry office: district court Cologne
Register number: HRB 52 497

****************************************************************************

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: raid5: direct IO and md_wakeup_thread
  2014-09-05 17:52 raid5: direct IO and md_wakeup_thread Markus Stockhausen
@ 2014-09-08  6:46 ` NeilBrown
  2014-09-08  7:24   ` AW: " Markus Stockhausen
  0 siblings, 1 reply; 4+ messages in thread
From: NeilBrown @ 2014-09-08  6:46 UTC (permalink / raw)
  To: Markus Stockhausen; +Cc: linux-raid@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 3686 bytes --]

On Fri, 5 Sep 2014 17:52:53 +0000 Markus Stockhausen
<stockhausen@collogia.de> wrote:

> Hi Neil,
> 
> I hope you are doing better after the flu. Maybe you find some time 
> to explain the following codings in raid5.c to me. At least one of
> the patchs seems to be signed off by you. The RAID5 I'm using has 
> default settings.
> 
> 1) Generally speaking of md_wakup_thread() calls. Does it make
> any sense to call them the from within raid5d()? Or when being 
> implemented at locations that might be called from the 
> make_request() path too - for simplicity no further caller destinction 
> was programmed.

I doesn't make sense to all md_wakeup_thread() from somewhere that is *only*
called within raid5d().  However most of raid5d() is handle_stripe() and
handle_stripe() is called from other places.  So it could make sence to call
md_wakeup_thread from within handle_stripe().

make_request() very likely wants to call md_wakeup_thread().... though I'm
wondering if I understand that part of the question properly.

>  
> 1) In case of a single direct I/O writer to the /dev/md0 device the 
> following coding will always fire the md_wakeup_thread(). That
> is one reason for alternating raid5d loops one with data to 
> process and the next without. Is that corner case wanted?

Is this a problem?  We often do things just in case they are needed,
providing the cost isn't too great.

If there was a measurable cost it might make sense to 

    clear_bit(THREAD_WAKEUP, &mddev->thread->flags);

in raid5d() at the top of the while loop in raid5d().  We would need to move
the md_check_recovery() call to the end of the function then.

> 
> static void handle_stripe(struct stripe_head *sh)
>         ...
>         ops_run_io(sh, &s);
> 
>         if (s.dec_preread_active) {
>                 /* We delay this until after ops_run_io so that if make_request
>                  * is waiting on a flush, it won't continue until the writes
>                  * have actually been submitted.
>                  */
>                 atomic_dec(&conf->preread_active_stripes);
>                 if (atomic_read(&conf->preread_active_stripes) <
>                     IO_THRESHOLD)
>                         md_wakeup_thread(conf->mddev->thread); */
>         }
> 
> 2) The wakeup_thread() in the following preread path seems
> to be unnecesary or will a double call wake the thread twice?

Unnecessary in some cases, necessary in others.  Tracking exactly when it is
necessary would be complex and error prone.

Are you just trying to understand the code, or is there something that you
think is a problem that could be fixed?

Thanks,
NeilBrown


> 
> void do_release_stripe(struct r5conf *conf, struct stripe_head *sh,
>                               struct list_head *temp_inactive_list, int mstwake)
> {
>         BUG_ON(!list_empty(&sh->lru));
>         BUG_ON(atomic_read(&conf->active_stripes)==0);
>         if (test_bit(STRIPE_HANDLE, &sh->state)) {
>                 if (test_bit(STRIPE_DELAYED, &sh->state) &&
>                     !test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) {
>                         list_add_tail(&sh->lru, &conf->delayed_list);
>                         if (atomic_read(&conf->preread_active_stripes)
>                             < IO_THRESHOLD)
>                                 md_wakeup_thread(conf->mddev->thread);
>                 } else if (test_bit(STRIPE_BIT_DELAY, &sh->state) &&
>                         ...
>                 }
>                 md_wakeup_thread(conf->mddev->thread);
>                 ...
> 
> Thanks in advance.
> 
> Markus

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* AW: raid5: direct IO and md_wakeup_thread
  2014-09-08  6:46 ` NeilBrown
@ 2014-09-08  7:24   ` Markus Stockhausen
  2014-09-09 18:48     ` Markus Stockhausen
  0 siblings, 1 reply; 4+ messages in thread
From: Markus Stockhausen @ 2014-09-08  7:24 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 3840 bytes --]

> Von: NeilBrown [neilb@suse.de]
> Gesendet: Montag, 8. September 2014 08:46
> An: Markus Stockhausen
> Cc: linux-raid@vger.kernel.org
> Betreff: Re: raid5: direct IO and md_wakeup_thread
> 
> On Fri, 5 Sep 2014 17:52:53 +0000 Markus Stockhausen
> <stockhausen@collogia.de> wrote:
> 
> > Hi Neil,
> >
> > I hope you are doing better after the flu. Maybe you find some time
> > to explain the following codings in raid5.c to me. At least one of
> > the patchs seems to be signed off by you. The RAID5 I'm using has
> > default settings.
> >
> > 1) Generally speaking of md_wakup_thread() calls. Does it make
> > any sense to call them the from within raid5d()? Or when being
> > implemented at locations that might be called from the
> > make_request() path too - for simplicity no further caller destinction
> > was programmed.
> 
> I doesn't make sense to all md_wakeup_thread() from somewhere that is *only*
> called within raid5d().  However most of raid5d() is handle_stripe() and
> handle_stripe() is called from other places.  So it could make sence to call
> md_wakeup_thread from within handle_stripe().
> 
> make_request() very likely wants to call md_wakeup_thread().... though I'm
> wondering if I understand that part of the question properly.
> 
> >
> > 1) In case of a single direct I/O writer to the /dev/md0 device the
> > following coding will always fire the md_wakeup_thread(). That
> > is one reason for alternating raid5d loops one with data to
> > process and the next without. Is that corner case wanted?
> 
> Is this a problem?  We often do things just in case they are needed,
> providing the cost isn't too great.
> 
> If there was a measurable cost it might make sense to
> 
>     clear_bit(THREAD_WAKEUP, &mddev->thread->flags);
> 
> in raid5d() at the top of the while loop in raid5d().  We would need to move
> the md_check_recovery() call to the end of the function then.
> 
> >
> > static void handle_stripe(struct stripe_head *sh)
> >         ...
> >         ops_run_io(sh, &s);
> >
> >         if (s.dec_preread_active) {
> >                 /* We delay this until after ops_run_io so that if make_request
> >                  * is waiting on a flush, it won't continue until the writes
> >                  * have actually been submitted.
> >                  */
> >                 atomic_dec(&conf->preread_active_stripes);
> >                 if (atomic_read(&conf->preread_active_stripes) <
> >                     IO_THRESHOLD)
> >                         md_wakeup_thread(conf->mddev->thread); */
> >         }
> >
> > 2) The wakeup_thread() in the following preread path seems
> > to be unnecesary or will a double call wake the thread twice?
> 
> Unnecessary in some cases, necessary in others.  Tracking exactly when it is
> necessary would be complex and error prone.
> 
> Are you just trying to understand the code, or is there something that you
> think is a problem that could be fixed?

HI Neil,

thanks for the explanations. I tried to understand the reason for the
"bad" benchmark numbers during my direct IO write tests for my rmw
patch. I only choose it to avoid caching side effects. Even with 64 
threads there is still a large gap to the theoretical maximum. So I 
reduced the test to 1 writer and took some time to have a look how 
raid5.c handles sync/direct writes. See my other post with only 5MB/s
4K direct writes to ramdisk backed raid5 md. 

With that insight I started to compare it to raid1.c. After some 
iterations it came to my mind that the empty raid5d() runs between
two sync writes might give extra latency and so I looked for corners 
where md_wakeup_thread() might be unnecessary. The result are
the questions above.

I'll give the clear_bit(THREAD_WAKEUP,...) a try.

Markus=

[-- Attachment #2: InterScan_Disclaimer.txt --]
[-- Type: text/plain, Size: 1650 bytes --]

****************************************************************************
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.

Über das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

Vorstand:
Kadir Akin
Dr. Michael Höhnerbach

Vorsitzender des Aufsichtsrates:
Hans Kristian Langva

Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

executive board:
Kadir Akin
Dr. Michael Höhnerbach

President of the supervisory board:
Hans Kristian Langva

Registry office: district court Cologne
Register number: HRB 52 497

****************************************************************************

^ permalink raw reply	[flat|nested] 4+ messages in thread

* AW: raid5: direct IO and md_wakeup_thread
  2014-09-08  7:24   ` AW: " Markus Stockhausen
@ 2014-09-09 18:48     ` Markus Stockhausen
  0 siblings, 0 replies; 4+ messages in thread
From: Markus Stockhausen @ 2014-09-09 18:48 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1245 bytes --]

> HI Neil,
> 
> thanks for the explanations. I tried to understand the reason for the
> "bad" benchmark numbers during my direct IO write tests for my rmw
> patch. I only choose it to avoid caching side effects. Even with 64
> threads there is still a large gap to the theoretical maximum. So I
> reduced the test to 1 writer and took some time to have a look how
> raid5.c handles sync/direct writes. See my other post with only 5MB/s
> 4K direct writes to ramdisk backed raid5 md.
> 
> With that insight I started to compare it to raid1.c. After some
> iterations it came to my mind that the empty raid5d() runs between
> two sync writes might give extra latency and so I looked for corners
> where md_wakeup_thread() might be unnecessary. The result are
> the questions above.
> 
> I'll give the clear_bit(THREAD_WAKEUP,...) a try.
> 
> Markus

Just an all-clear response. The error I was analyzing came from my
Virutalbox test environment. With more than one CPU activated
process synchronzation seems to face high synchonization penalties
between make_request() and raid5d(). Switching back to real 
hardware with /dev/ramX shows normal throughput.  

A lot to wonder about when testing the kernel ...

Markus
=

[-- Attachment #2: InterScan_Disclaimer.txt --]
[-- Type: text/plain, Size: 1650 bytes --]

****************************************************************************
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.

Über das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

Vorstand:
Kadir Akin
Dr. Michael Höhnerbach

Vorsitzender des Aufsichtsrates:
Hans Kristian Langva

Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

executive board:
Kadir Akin
Dr. Michael Höhnerbach

President of the supervisory board:
Hans Kristian Langva

Registry office: district court Cologne
Register number: HRB 52 497

****************************************************************************

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-09-09 18:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-05 17:52 raid5: direct IO and md_wakeup_thread Markus Stockhausen
2014-09-08  6:46 ` NeilBrown
2014-09-08  7:24   ` AW: " Markus Stockhausen
2014-09-09 18:48     ` Markus Stockhausen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).