A commented out optimization...

Flexible I/O Tester development
 help / color / mirror / Atom feed

* A commented out optimization...
@ 2012-02-01  1:40 Steven Lang
  2012-02-01  8:22 ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Steven Lang @ 2012-02-01  1:40 UTC (permalink / raw)
  To: fio

In comparing the performance between older versions of fio and newer
versions, I noticed right away that pattern writes were slower in fio
2.0.1. ï¿½Running perf showed that most of the time spent in fio was in
fill_pattern (Over 50% of the CPU time), which was not the case in the
past.

This seems like it is worth it to have around, however a year ago it
was commented out with the following commit log:

Author: Jens Axboe <jaxboe@fusionio.com>  2011-01-14 06:32:30
Committer: Jens Axboe <jaxboe@fusionio.com>  2011-01-14 06:32:30
Parent: 9a793c2f91a47df348237f1a9b778253ca87ec2e (Fix race in exit of
eta/util thread)
Child:  69204d6e8830464bc98fcc28ca91412d6d360775 (Eta/disk thread uses
more than the minimum stack)
Branches: continue_on_error_type_2, master, remotes/origin/master,
remotes/origin/stable-1.x
Follows: fio-1.50-rc2
Precedes: fio-1.50-rc3

    Comment out ->buf_filled_len in pattern fill

    It's buggy, needs to be debugged. Disable for now. It can cause
    verify failures.

    Signed-off-by: Jens Axboe <jaxboe@fusionio.com>

And in the code the comments say "We need to ensure that the pattern
stores are seen before the fill length store, or we could observe
headers that aren't valid to the extent notified by the fill length".

However I'm having trouble imaging a case when that will happen, given
that buffers are filled and consumed within the context of a single
thread.

Do you happen to remember what sort of case was buggy and causing this to fail?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A commented out optimization...
  2012-02-01  1:40 A commented out optimization Steven Lang
@ 2012-02-01  8:22 ` Jens Axboe
  2012-02-01 21:46   ` Steven Lang
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2012-02-01  8:22 UTC (permalink / raw)
  To: Steven Lang; +Cc: fio

On 02/01/2012 02:40 AM, Steven Lang wrote:
> In comparing the performance between older versions of fio and newer
> versions, I noticed right away that pattern writes were slower in fio
> 2.0.1.  Running perf showed that most of the time spent in fio was in
> fill_pattern (Over 50% of the CPU time), which was not the case in the
> past.
> 
> This seems like it is worth it to have around, however a year ago it
> was commented out with the following commit log:
> 
> Author: Jens Axboe <jaxboe@fusionio.com>  2011-01-14 06:32:30
> Committer: Jens Axboe <jaxboe@fusionio.com>  2011-01-14 06:32:30
> Parent: 9a793c2f91a47df348237f1a9b778253ca87ec2e (Fix race in exit of
> eta/util thread)
> Child:  69204d6e8830464bc98fcc28ca91412d6d360775 (Eta/disk thread uses
> more than the minimum stack)
> Branches: continue_on_error_type_2, master, remotes/origin/master,
> remotes/origin/stable-1.x
> Follows: fio-1.50-rc2
> Precedes: fio-1.50-rc3
> 
>     Comment out ->buf_filled_len in pattern fill
> 
>     It's buggy, needs to be debugged. Disable for now. It can cause
>     verify failures.
> 
>     Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
> 
> And in the code the comments say "We need to ensure that the pattern
> stores are seen before the fill length store, or we could observe
> headers that aren't valid to the extent notified by the fill length".
> 
> However I'm having trouble imaging a case when that will happen, given
> that buffers are filled and consumed within the context of a single
> thread.

That's not always the case, if you are using async verifies you can have
multiple threads/CPUs looking at the same buffers.

> Do you happen to remember what sort of case was buggy and causing this
> to fail?

Let me see if I can find the thread... OK, here it is:

http://www.spinics.net/lists/fio/msg00538.html

I think the issue just dropped off the debug todo. I'd appreciate your
input :-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A commented out optimization...
  2012-02-01  8:22 ` Jens Axboe
@ 2012-02-01 21:46   ` Steven Lang
  2012-02-02  8:03     ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Steven Lang @ 2012-02-01 21:46 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio@vger.kernel.org

On Wed, Feb 1, 2012 at 12:22 AM, Jens Axboe <jaxboe@fusionio.com> wrote:
> On 02/01/2012 02:40 AM, Steven Lang wrote:
>> In comparing the performance between older versions of fio and newer
>> versions, I noticed right away that pattern writes were slower in fio
>> 2.0.1.  Running perf showed that most of the time spent in fio was in
>> fill_pattern (Over 50% of the CPU time), which was not the case in the
>> past.
>>
>> This seems like it is worth it to have around, however a year ago it
>> was commented out with the following commit log:
>>
>> Author: Jens Axboe <jaxboe@fusionio.com>  2011-01-14 06:32:30
>> Committer: Jens Axboe <jaxboe@fusionio.com>  2011-01-14 06:32:30
>> Parent: 9a793c2f91a47df348237f1a9b778253ca87ec2e (Fix race in exit of
>> eta/util thread)
>> Child:  69204d6e8830464bc98fcc28ca91412d6d360775 (Eta/disk thread uses
>> more than the minimum stack)
>> Branches: continue_on_error_type_2, master, remotes/origin/master,
>> remotes/origin/stable-1.x
>> Follows: fio-1.50-rc2
>> Precedes: fio-1.50-rc3
>>
>>     Comment out ->buf_filled_len in pattern fill
>>
>>     It's buggy, needs to be debugged. Disable for now. It can cause
>>     verify failures.
>>
>>     Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
>>
>> And in the code the comments say "We need to ensure that the pattern
>> stores are seen before the fill length store, or we could observe
>> headers that aren't valid to the extent notified by the fill length".
>>
>> However I'm having trouble imaging a case when that will happen, given
>> that buffers are filled and consumed within the context of a single
>> thread.
>
> That's not always the case, if you are using async verifies you can have
> multiple threads/CPUs looking at the same buffers.

I thought of that; but unless I am misunderstanding the verify code
(entirely possible) the call to fill_pattern is only made when writing
data.  Doing a rw=readwrite load could potentially mix asynchronous
verify threads with calls to fill_buffer, but only the IO thread would
be touching the buf_filled_len field, setting it to 0 before reads,
and the length before writes.

I have not worked with multiprocessor PPC, so I have no idea how bad
memory ordering issues are there, however I am at a loss as to how
there would be an issue here.  My understanding of the life of an io_u
is that it is created within the IO thread, initially filled with the
base pattern (Or random data) within that thread, pulled off the free
list and re-filled, passed to the IO engine, then put back on the free
list to repeat the process.  Never, in a write-only load, is the io_u
ever touched by any other thread.  For a read load with verify_async,
the io_u could be passed off to a verify thread (But will never be
concurrently accessed in both the verify thread and the IO thread),
however the verify thread  would never update either the buffer data
or the filled len field.  (Or, really, even check it, since that only
happens on writes.)  A readwrite load with verify_async would be the
only case I could see a potential issue, but the verify thread still
never modifies either the buffer or the filled len.

So either my understanding of the fio code is mistaken, my
understanding of memory ordering errors is mistaken, or something else
I am not considering is happening.

>> Do you happen to remember what sort of case was buggy and causing this
>> to fail?
>
> Let me see if I can find the thread... OK, here it is:
>
> http://www.spinics.net/lists/fio/msg00538.html
>
> I think the issue just dropped off the debug todo. I'd appreciate your
> input :-)

Ah thanks, it seems I didn't look back far enough in the archives; I'd
assumed any thread would be closer to the commit in January.

Unfortunately if that's indeed the case, I'm not sure I can fully
understand why the bug existed to offer up any input on a fix.  I
don't even understand how this particular code could deviate far
enough from a single threaded load for memory barriers to make a
difference.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A commented out optimization...
  2012-02-01 21:46   ` Steven Lang
@ 2012-02-02  8:03     ` Jens Axboe
  2012-02-02 19:13       ` Steven Lang
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2012-02-02  8:03 UTC (permalink / raw)
  To: Steven Lang; +Cc: fio@vger.kernel.org

On 02/01/2012 10:46 PM, Steven Lang wrote:
> On Wed, Feb 1, 2012 at 12:22 AM, Jens Axboe <jaxboe@fusionio.com> wrote:
>> On 02/01/2012 02:40 AM, Steven Lang wrote:
>>> In comparing the performance between older versions of fio and newer
>>> versions, I noticed right away that pattern writes were slower in fio
>>> 2.0.1.  Running perf showed that most of the time spent in fio was in
>>> fill_pattern (Over 50% of the CPU time), which was not the case in the
>>> past.
>>>
>>> This seems like it is worth it to have around, however a year ago it
>>> was commented out with the following commit log:
>>>
>>> Author: Jens Axboe <jaxboe@fusionio.com>  2011-01-14 06:32:30
>>> Committer: Jens Axboe <jaxboe@fusionio.com>  2011-01-14 06:32:30
>>> Parent: 9a793c2f91a47df348237f1a9b778253ca87ec2e (Fix race in exit of
>>> eta/util thread)
>>> Child:  69204d6e8830464bc98fcc28ca91412d6d360775 (Eta/disk thread uses
>>> more than the minimum stack)
>>> Branches: continue_on_error_type_2, master, remotes/origin/master,
>>> remotes/origin/stable-1.x
>>> Follows: fio-1.50-rc2
>>> Precedes: fio-1.50-rc3
>>>
>>>     Comment out ->buf_filled_len in pattern fill
>>>
>>>     It's buggy, needs to be debugged. Disable for now. It can cause
>>>     verify failures.
>>>
>>>     Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
>>>
>>> And in the code the comments say "We need to ensure that the pattern
>>> stores are seen before the fill length store, or we could observe
>>> headers that aren't valid to the extent notified by the fill length".
>>>
>>> However I'm having trouble imaging a case when that will happen, given
>>> that buffers are filled and consumed within the context of a single
>>> thread.
>>
>> That's not always the case, if you are using async verifies you can have
>> multiple threads/CPUs looking at the same buffers.
> 
> I thought of that; but unless I am misunderstanding the verify code
> (entirely possible) the call to fill_pattern is only made when writing
> data.  Doing a rw=readwrite load could potentially mix asynchronous
> verify threads with calls to fill_buffer, but only the IO thread would
> be touching the buf_filled_len field, setting it to 0 before reads,
> and the length before writes.
> 
> I have not worked with multiprocessor PPC, so I have no idea how bad
> memory ordering issues are there, however I am at a loss as to how
> there would be an issue here.  My understanding of the life of an io_u
> is that it is created within the IO thread, initially filled with the
> base pattern (Or random data) within that thread, pulled off the free
> list and re-filled, passed to the IO engine, then put back on the free
> list to repeat the process.  Never, in a write-only load, is the io_u
> ever touched by any other thread.  For a read load with verify_async,
> the io_u could be passed off to a verify thread (But will never be
> concurrently accessed in both the verify thread and the IO thread),
> however the verify thread  would never update either the buffer data
> or the filled len field.  (Or, really, even check it, since that only
> happens on writes.)  A readwrite load with verify_async would be the
> only case I could see a potential issue, but the verify thread still
> never modifies either the buffer or the filled len.
> 
> So either my understanding of the fio code is mistaken, my
> understanding of memory ordering errors is mistaken, or something else
> I am not considering is happening.

I'm not really seeing it either. It bothers me, since I was remembering
some sort of test case for this. You are right in the async verifies, it
should not be an issue there. In any case, the pthread lock primitives
ought to have the appropriate memory barriers. So any addition/removal
from the freelist should be fine. And for the non-async case, a preempt
to a different CPU always implies a full memory barrier.

>>> Do you happen to remember what sort of case was buggy and causing this
>>> to fail?
>>
>> Let me see if I can find the thread... OK, here it is:
>>
>> http://www.spinics.net/lists/fio/msg00538.html
>>
>> I think the issue just dropped off the debug todo. I'd appreciate your
>> input :-)
> 
> Ah thanks, it seems I didn't look back far enough in the archives; I'd
> assumed any thread would be closer to the commit in January.
> 
> Unfortunately if that's indeed the case, I'm not sure I can fully
> understand why the bug existed to offer up any input on a fix.  I
> don't even understand how this particular code could deviate far
> enough from a single threaded load for memory barriers to make a
> difference.

I'll reenable it.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A commented out optimization...
  2012-02-02  8:03     ` Jens Axboe
@ 2012-02-02 19:13       ` Steven Lang
  2012-02-02 19:16         ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Steven Lang @ 2012-02-02 19:13 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio@vger.kernel.org

The only thing I can think is maybe there was some sort of other issue
where perhaps a DMA IO was happening before the memory write,
unrelated to any of the code there, but the write barrier just
happened to fix it.  But even that seems like it would be a bit of a
stretch to imagine it actually being a problem.

On Thu, Feb 2, 2012 at 12:03 AM, Jens Axboe <jaxboe@fusionio.com> wrote:
> On 02/01/2012 10:46 PM, Steven Lang wrote:
>> On Wed, Feb 1, 2012 at 12:22 AM, Jens Axboe <jaxboe@fusionio.com> wrote:
>>> On 02/01/2012 02:40 AM, Steven Lang wrote:
>>>> In comparing the performance between older versions of fio and newer
>>>> versions, I noticed right away that pattern writes were slower in fio
>>>> 2.0.1.  Running perf showed that most of the time spent in fio was in
>>>> fill_pattern (Over 50% of the CPU time), which was not the case in the
>>>> past.
>>>>
>>>> This seems like it is worth it to have around, however a year ago it
>>>> was commented out with the following commit log:
>>>>
>>>> Author: Jens Axboe <jaxboe@fusionio.com>  2011-01-14 06:32:30
>>>> Committer: Jens Axboe <jaxboe@fusionio.com>  2011-01-14 06:32:30
>>>> Parent: 9a793c2f91a47df348237f1a9b778253ca87ec2e (Fix race in exit of
>>>> eta/util thread)
>>>> Child:  69204d6e8830464bc98fcc28ca91412d6d360775 (Eta/disk thread uses
>>>> more than the minimum stack)
>>>> Branches: continue_on_error_type_2, master, remotes/origin/master,
>>>> remotes/origin/stable-1.x
>>>> Follows: fio-1.50-rc2
>>>> Precedes: fio-1.50-rc3
>>>>
>>>>     Comment out ->buf_filled_len in pattern fill
>>>>
>>>>     It's buggy, needs to be debugged. Disable for now. It can cause
>>>>     verify failures.
>>>>
>>>>     Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
>>>>
>>>> And in the code the comments say "We need to ensure that the pattern
>>>> stores are seen before the fill length store, or we could observe
>>>> headers that aren't valid to the extent notified by the fill length".
>>>>
>>>> However I'm having trouble imaging a case when that will happen, given
>>>> that buffers are filled and consumed within the context of a single
>>>> thread.
>>>
>>> That's not always the case, if you are using async verifies you can have
>>> multiple threads/CPUs looking at the same buffers.
>>
>> I thought of that; but unless I am misunderstanding the verify code
>> (entirely possible) the call to fill_pattern is only made when writing
>> data.  Doing a rw=readwrite load could potentially mix asynchronous
>> verify threads with calls to fill_buffer, but only the IO thread would
>> be touching the buf_filled_len field, setting it to 0 before reads,
>> and the length before writes.
>>
>> I have not worked with multiprocessor PPC, so I have no idea how bad
>> memory ordering issues are there, however I am at a loss as to how
>> there would be an issue here.  My understanding of the life of an io_u
>> is that it is created within the IO thread, initially filled with the
>> base pattern (Or random data) within that thread, pulled off the free
>> list and re-filled, passed to the IO engine, then put back on the free
>> list to repeat the process.  Never, in a write-only load, is the io_u
>> ever touched by any other thread.  For a read load with verify_async,
>> the io_u could be passed off to a verify thread (But will never be
>> concurrently accessed in both the verify thread and the IO thread),
>> however the verify thread  would never update either the buffer data
>> or the filled len field.  (Or, really, even check it, since that only
>> happens on writes.)  A readwrite load with verify_async would be the
>> only case I could see a potential issue, but the verify thread still
>> never modifies either the buffer or the filled len.
>>
>> So either my understanding of the fio code is mistaken, my
>> understanding of memory ordering errors is mistaken, or something else
>> I am not considering is happening.
>
> I'm not really seeing it either. It bothers me, since I was remembering
> some sort of test case for this. You are right in the async verifies, it
> should not be an issue there. In any case, the pthread lock primitives
> ought to have the appropriate memory barriers. So any addition/removal
> from the freelist should be fine. And for the non-async case, a preempt
> to a different CPU always implies a full memory barrier.
>
>>>> Do you happen to remember what sort of case was buggy and causing this
>>>> to fail?
>>>
>>> Let me see if I can find the thread... OK, here it is:
>>>
>>> http://www.spinics.net/lists/fio/msg00538.html
>>>
>>> I think the issue just dropped off the debug todo. I'd appreciate your
>>> input :-)
>>
>> Ah thanks, it seems I didn't look back far enough in the archives; I'd
>> assumed any thread would be closer to the commit in January.
>>
>> Unfortunately if that's indeed the case, I'm not sure I can fully
>> understand why the bug existed to offer up any input on a fix.  I
>> don't even understand how this particular code could deviate far
>> enough from a single threaded load for memory barriers to make a
>> difference.
>
> I'll reenable it.
>
> --
> Jens Axboe
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A commented out optimization...
  2012-02-02 19:13       ` Steven Lang
@ 2012-02-02 19:16         ` Jens Axboe
  0 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2012-02-02 19:16 UTC (permalink / raw)
  To: Steven Lang; +Cc: fio@vger.kernel.org

On 2012-02-02 20:13, Steven Lang wrote:
> The only thing I can think is maybe there was some sort of other issue
> where perhaps a DMA IO was happening before the memory write,
> unrelated to any of the code there, but the write barrier just
> happened to fix it.  But even that seems like it would be a bit of a
> stretch to imagine it actually being a problem.

I was thinking that could've been the issue as well, but in that case it
would be a kernel (or driver) bug in that it didn't flush.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-02-02 19:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-01  1:40 A commented out optimization Steven Lang
2012-02-01  8:22 ` Jens Axboe
2012-02-01 21:46   ` Steven Lang
2012-02-02  8:03     ` Jens Axboe
2012-02-02 19:13       ` Steven Lang
2012-02-02 19:16         ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox