All of lore.kernel.org
 help / color / mirror / Atom feed
* fiologparser.py
@ 2016-05-24 10:35 Martin Steigerwald
  2016-05-24 11:12 ` fiologparser.py Ben England
  0 siblings, 1 reply; 14+ messages in thread
From: Martin Steigerwald @ 2016-05-24 10:35 UTC (permalink / raw)
  To: fio; +Cc: Mark Nelson, Ben England, Jens Axboe

Hello Mark, Ben,

I found fiologparser.py in fio 2.10 and for now packaged it into the /usr/
share/doc/fio. Yet I´d like to more promintly place it in /usr/bin or so… for 
that I would need it to have no script name ending (as according to Debian 
Policy). Would be fine with having it renamed to just  fiologparser? I can 
provide a patch to Jens.

Also it has no manpage, but a short intro in the script source itself. Do you 
intent to provide a manpage? Otherwise I may have a go at it with help2man or 
so once in a while. It appears to have quite some options:

# ./fiologparser.py         
usage: fiologparser.py [-h] [-i INTERVAL] [-d DIVISOR] [-f] [-A] [-a] [-s]
                       FILE [FILE ...]
fiologparser.py: error: too few arguments

Care to elaborate what these are doing (besides what is mentioned in script 
header)?

It requires python-scipy it seems. Anything else?

Thank you,



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fiologparser.py
  2016-05-24 10:35 fiologparser.py Martin Steigerwald
@ 2016-05-24 11:12 ` Ben England
  2016-05-24 14:04   ` fiologparser.py Mark Nelson
  0 siblings, 1 reply; 14+ messages in thread
From: Ben England @ 2016-05-24 11:12 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: fio, Mark Nelson, Jens Axboe

no objections here, 

There is some additional documentation in the parse_args() routine in the "help" keyword parameter to parser.add_argument.

  parser.add_argument("FILE", help="collectl log output files to parse", nargs="+")

it's really a fio-generated latency log, not a collectl log

not aware of any dependencies except python-scipy.  It doesn't work with python3 yet.

Additional documentation for the -A option is in:

https://github.com/axboe/fio/pull/177

-ben

----- Original Message -----
> From: "Martin Steigerwald" <ms@teamix.de>
> To: fio@vger.kernel.org
> Cc: "Mark Nelson" <mnelson@redhat.com>, "Ben England" <bengland@redhat.com>, "Jens Axboe" <axboe@kernel.dk>
> Sent: Tuesday, May 24, 2016 6:35:54 AM
> Subject: fiologparser.py
> 
> Hello Mark, Ben,
> 
> I found fiologparser.py in fio 2.10 and for now packaged it into the /usr/
> share/doc/fio. Yet I´d like to more promintly place it in /usr/bin or so… for
> that I would need it to have no script name ending (as according to Debian
> Policy). Would be fine with having it renamed to just  fiologparser? I can
> provide a patch to Jens.
> 
> Also it has no manpage, but a short intro in the script source itself. Do you
> intent to provide a manpage? Otherwise I may have a go at it with help2man or
> so once in a while. It appears to have quite some options:
> 
> # ./fiologparser.py
> usage: fiologparser.py [-h] [-i INTERVAL] [-d DIVISOR] [-f] [-A] [-a] [-s]
>                        FILE [FILE ...]
> fiologparser.py: error: too few arguments
> 
> Care to elaborate what these are doing (besides what is mentioned in script
> header)?
> 
> It requires python-scipy it seems. Anything else?
> 
> Thank you,
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fiologparser.py
  2016-05-24 11:12 ` fiologparser.py Ben England
@ 2016-05-24 14:04   ` Mark Nelson
  2016-05-24 14:11     ` fiologparser.py Jens Axboe
  2016-05-24 15:22     ` fiologparser.py Ben England
  0 siblings, 2 replies; 14+ messages in thread
From: Mark Nelson @ 2016-05-24 14:04 UTC (permalink / raw)
  To: Ben England, Martin Steigerwald; +Cc: fio, Mark Nelson, Jens Axboe

I'm totally fine with removing .py from the name unless anyone else has 
objections.  I hadn't really thought about writing a man page, but that 
would be great if you want to give it a go.

Let's see if we can remove the numpy and scipy dependencies.  It looks 
like we are just using it for min/average/median/max/percentile 
calculations.  It would be nice if users didn't need anything other than 
argparse.

Mark

On 05/24/2016 06:12 AM, Ben England wrote:
> no objections here,
>
> There is some additional documentation in the parse_args() routine in the "help" keyword parameter to parser.add_argument.
>
>   parser.add_argument("FILE", help="collectl log output files to parse", nargs="+")
>
> it's really a fio-generated latency log, not a collectl log
>
> not aware of any dependencies except python-scipy.  It doesn't work with python3 yet.
>
> Additional documentation for the -A option is in:
>
> https://github.com/axboe/fio/pull/177
>
> -ben
>
> ----- Original Message -----
>> From: "Martin Steigerwald" <ms@teamix.de>
>> To: fio@vger.kernel.org
>> Cc: "Mark Nelson" <mnelson@redhat.com>, "Ben England" <bengland@redhat.com>, "Jens Axboe" <axboe@kernel.dk>
>> Sent: Tuesday, May 24, 2016 6:35:54 AM
>> Subject: fiologparser.py
>>
>> Hello Mark, Ben,
>>
>> I found fiologparser.py in fio 2.10 and for now packaged it into the /usr/
>> share/doc/fio. Yet I´d like to more promintly place it in /usr/bin or so… for
>> that I would need it to have no script name ending (as according to Debian
>> Policy). Would be fine with having it renamed to just  fiologparser? I can
>> provide a patch to Jens.
>>
>> Also it has no manpage, but a short intro in the script source itself. Do you
>> intent to provide a manpage? Otherwise I may have a go at it with help2man or
>> so once in a while. It appears to have quite some options:
>>
>> # ./fiologparser.py
>> usage: fiologparser.py [-h] [-i INTERVAL] [-d DIVISOR] [-f] [-A] [-a] [-s]
>>                        FILE [FILE ...]
>> fiologparser.py: error: too few arguments
>>
>> Care to elaborate what these are doing (besides what is mentioned in script
>> header)?
>>
>> It requires python-scipy it seems. Anything else?
>>
>> Thank you,
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe fio" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fiologparser.py
  2016-05-24 14:04   ` fiologparser.py Mark Nelson
@ 2016-05-24 14:11     ` Jens Axboe
  2016-05-24 14:17       ` fiologparser.py Mark Nelson
  2016-05-24 15:22     ` fiologparser.py Ben England
  1 sibling, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2016-05-24 14:11 UTC (permalink / raw)
  To: Mark Nelson, Ben England, Martin Steigerwald; +Cc: fio, Mark Nelson

On 05/24/2016 08:04 AM, Mark Nelson wrote:
> I'm totally fine with removing .py from the name unless anyone else has
> objections.  I hadn't really thought about writing a man page, but that
> would be great if you want to give it a go.
>
> Let's see if we can remove the numpy and scipy dependencies.  It looks
> like we are just using it for min/average/median/max/percentile
> calculations.  It would be nice if users didn't need anything other than
> argparse.

I haven't looked at it, but if it's just for min/av/median/etc and 
percentiles, that can be trivially hand rolled.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fiologparser.py
  2016-05-24 14:11     ` fiologparser.py Jens Axboe
@ 2016-05-24 14:17       ` Mark Nelson
  0 siblings, 0 replies; 14+ messages in thread
From: Mark Nelson @ 2016-05-24 14:17 UTC (permalink / raw)
  To: Jens Axboe, Ben England, Martin Steigerwald; +Cc: fio, Mark Nelson

On 05/24/2016 09:11 AM, Jens Axboe wrote:
> On 05/24/2016 08:04 AM, Mark Nelson wrote:
>> I'm totally fine with removing .py from the name unless anyone else has
>> objections.  I hadn't really thought about writing a man page, but that
>> would be great if you want to give it a go.
>>
>> Let's see if we can remove the numpy and scipy dependencies.  It looks
>> like we are just using it for min/average/median/max/percentile
>> calculations.  It would be nice if users didn't need anything other than
>> argparse.
>
> I haven't looked at it, but if it's just for min/av/median/etc and
> percentiles, that can be trivially hand rolled.
>

Exactly my thoughts!

Mark


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fiologparser.py
  2016-05-24 14:04   ` fiologparser.py Mark Nelson
  2016-05-24 14:11     ` fiologparser.py Jens Axboe
@ 2016-05-24 15:22     ` Ben England
  2016-05-24 15:28       ` fiologparser.py Jens Axboe
  2016-05-25  7:20       ` fiologparser.py Martin Steigerwald
  1 sibling, 2 replies; 14+ messages in thread
From: Ben England @ 2016-05-24 15:22 UTC (permalink / raw)
  To: Mark Nelson; +Cc: Martin Steigerwald, fio, Mark Nelson, Jens Axboe



----- Original Message -----
> From: "Mark Nelson" <mark.a.nelson@gmail.com>
> To: "Ben England" <bengland@redhat.com>, "Martin Steigerwald" <ms@teamix.de>
> Cc: fio@vger.kernel.org, "Mark Nelson" <mnelson@redhat.com>, "Jens Axboe" <axboe@kernel.dk>
> Sent: Tuesday, May 24, 2016 10:04:14 AM
> Subject: Re: fiologparser.py
> 
> Let's see if we can remove the numpy and scipy dependencies.  It looks
> like we are just using it for min/average/median/max/percentile
> calculations.  It would be nice if users didn't need anything other than
> argparse.
> 

Just curious, why is scipy a problem?  Is it because CBT isn't a package so you don't get dependencies handled when you install it?  You are correct, it's easy to remove the dependencies, I just didn't know it was causing problems for people.  You can get percentiles from just sorting the sample values and indexing into the array at the appropriate offset, I was just trying to re-use existing classes.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fiologparser.py
  2016-05-24 15:22     ` fiologparser.py Ben England
@ 2016-05-24 15:28       ` Jens Axboe
  2016-05-24 15:35         ` fiologparser.py Ben England
  2016-05-25  7:20       ` fiologparser.py Martin Steigerwald
  1 sibling, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2016-05-24 15:28 UTC (permalink / raw)
  To: Ben England, Mark Nelson; +Cc: Martin Steigerwald, fio, Mark Nelson

On 05/24/2016 09:22 AM, Ben England wrote:
>
>
> ----- Original Message -----
>> From: "Mark Nelson" <mark.a.nelson@gmail.com>
>> To: "Ben England" <bengland@redhat.com>, "Martin Steigerwald" <ms@teamix.de>
>> Cc: fio@vger.kernel.org, "Mark Nelson" <mnelson@redhat.com>, "Jens Axboe" <axboe@kernel.dk>
>> Sent: Tuesday, May 24, 2016 10:04:14 AM
>> Subject: Re: fiologparser.py
>>
>> Let's see if we can remove the numpy and scipy dependencies.  It looks
>> like we are just using it for min/average/median/max/percentile
>> calculations.  It would be nice if users didn't need anything other than
>> argparse.
>>
>
> Just curious, why is scipy a problem?  Is it because CBT isn't a
> package so you don't get dependencies handled when you install it?  You
> are correct, it's easy to remove the dependencies, I just didn't know it
> was causing problems for people.  You can get percentiles from just
> sorting the sample values and indexing into the array at the appropriate
> offset, I was just trying to re-use existing classes.

It's not necessarily a problem, but the less dependencies you have, the
easier it is for people to use. I do the same for fio, try to have as
few external dependencies as possible. Remember, not everybody is
running on Linux...

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fiologparser.py
  2016-05-24 15:28       ` fiologparser.py Jens Axboe
@ 2016-05-24 15:35         ` Ben England
  2016-05-24 16:20           ` fiologparser.py Mark Nelson
  0 siblings, 1 reply; 14+ messages in thread
From: Ben England @ 2016-05-24 15:35 UTC (permalink / raw)
  To: Jens Axboe, Mark Nelson; +Cc: Martin Steigerwald, fio, Mark Nelson

OK we'll remove the dependencies, I still want to have the -A option supported.
-ben

----- Original Message -----
> From: "Jens Axboe" <axboe@kernel.dk>
> To: "Ben England" <bengland@redhat.com>, "Mark Nelson" <mark.a.nelson@gmail.com>
> Cc: "Martin Steigerwald" <ms@teamix.de>, fio@vger.kernel.org, "Mark Nelson" <mnelson@redhat.com>
> Sent: Tuesday, May 24, 2016 11:28:39 AM
> Subject: Re: fiologparser.py
> 
> On 05/24/2016 09:22 AM, Ben England wrote:
> >
> >
> > ----- Original Message -----
> >> From: "Mark Nelson" <mark.a.nelson@gmail.com>
> >> To: "Ben England" <bengland@redhat.com>, "Martin Steigerwald"
> >> <ms@teamix.de>
> >> Cc: fio@vger.kernel.org, "Mark Nelson" <mnelson@redhat.com>, "Jens Axboe"
> >> <axboe@kernel.dk>
> >> Sent: Tuesday, May 24, 2016 10:04:14 AM
> >> Subject: Re: fiologparser.py
> >>
> >> Let's see if we can remove the numpy and scipy dependencies.  It looks
> >> like we are just using it for min/average/median/max/percentile
> >> calculations.  It would be nice if users didn't need anything other than
> >> argparse.
> >>
> >
> > Just curious, why is scipy a problem?  Is it because CBT isn't a
> > package so you don't get dependencies handled when you install it?  You
> > are correct, it's easy to remove the dependencies, I just didn't know it
> > was causing problems for people.  You can get percentiles from just
> > sorting the sample values and indexing into the array at the appropriate
> > offset, I was just trying to re-use existing classes.
> 
> It's not necessarily a problem, but the less dependencies you have, the
> easier it is for people to use. I do the same for fio, try to have as
> few external dependencies as possible. Remember, not everybody is
> running on Linux...
> 
> --
> Jens Axboe
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fiologparser.py
  2016-05-24 15:35         ` fiologparser.py Ben England
@ 2016-05-24 16:20           ` Mark Nelson
  2016-05-24 18:38             ` fiologparser.py Jeff Furlong
  2016-05-24 20:47             ` fiologparser.py Ben England
  0 siblings, 2 replies; 14+ messages in thread
From: Mark Nelson @ 2016-05-24 16:20 UTC (permalink / raw)
  To: Ben England, Jens Axboe; +Cc: Martin Steigerwald, fio, Mark Nelson

I've got a version that removes the dependency and appears to return the 
same values:

https://github.com/axboe/fio/pull/181

Going through the code though, it looks like the -A values are computed 
differently than in the other original functions.  In the original 
get_contribution function, all samples within the bounds are counted, 
along with samples that are only partially within the bounds.  Each 
sample is weighted based on the duration it overlapped with the sample 
period:

https://github.com/axboe/fio/blob/master/tools/fiologparser.py#L195-L198

for -A, only the samples that are totally within the bounds are counted, 
and are weighted equally despite how much of the period was spent in 
that sample:

https://github.com/axboe/fio/blob/master/tools/fiologparser.py#L173

Thus if you look at say the average from -a:

fiologparser.py -a *clat*

1000, 11582.770
2000, 14033.844
3000, 17087.446
4000, 17946.245
5000, 14554.196
6000, 14407.804
7000, 15218.106
8000, 15157.951

the results are quite a bit different from -A:

fiologparser.py -A *clat* | tr -s "," " " | cut -f1,4 -d" "

0.000000 11902.719298
1000.000000 13247.750000
2000.000000 14270.549020
3000.000000 15092.192308
4000.000000 14127.472727
5000.000000 12880.137931
6000.000000 15296.735849
7000.000000 14857.306122
8000.000000 14854.766667

Mark


On 05/24/2016 10:35 AM, Ben England wrote:
> OK we'll remove the dependencies, I still want to have the -A option supported.
> -ben
>
> ----- Original Message -----
>> From: "Jens Axboe" <axboe@kernel.dk>
>> To: "Ben England" <bengland@redhat.com>, "Mark Nelson" <mark.a.nelson@gmail.com>
>> Cc: "Martin Steigerwald" <ms@teamix.de>, fio@vger.kernel.org, "Mark Nelson" <mnelson@redhat.com>
>> Sent: Tuesday, May 24, 2016 11:28:39 AM
>> Subject: Re: fiologparser.py
>>
>> On 05/24/2016 09:22 AM, Ben England wrote:
>>>
>>>
>>> ----- Original Message -----
>>>> From: "Mark Nelson" <mark.a.nelson@gmail.com>
>>>> To: "Ben England" <bengland@redhat.com>, "Martin Steigerwald"
>>>> <ms@teamix.de>
>>>> Cc: fio@vger.kernel.org, "Mark Nelson" <mnelson@redhat.com>, "Jens Axboe"
>>>> <axboe@kernel.dk>
>>>> Sent: Tuesday, May 24, 2016 10:04:14 AM
>>>> Subject: Re: fiologparser.py
>>>>
>>>> Let's see if we can remove the numpy and scipy dependencies.  It looks
>>>> like we are just using it for min/average/median/max/percentile
>>>> calculations.  It would be nice if users didn't need anything other than
>>>> argparse.
>>>>
>>>
>>> Just curious, why is scipy a problem?  Is it because CBT isn't a
>>> package so you don't get dependencies handled when you install it?  You
>>> are correct, it's easy to remove the dependencies, I just didn't know it
>>> was causing problems for people.  You can get percentiles from just
>>> sorting the sample values and indexing into the array at the appropriate
>>> offset, I was just trying to re-use existing classes.
>>
>> It's not necessarily a problem, but the less dependencies you have, the
>> easier it is for people to use. I do the same for fio, try to have as
>> few external dependencies as possible. Remember, not everybody is
>> running on Linux...
>>
>> --
>> Jens Axboe
>>
>>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: fiologparser.py
  2016-05-24 16:20           ` fiologparser.py Mark Nelson
@ 2016-05-24 18:38             ` Jeff Furlong
  2016-05-24 20:47             ` fiologparser.py Ben England
  1 sibling, 0 replies; 14+ messages in thread
From: Jeff Furlong @ 2016-05-24 18:38 UTC (permalink / raw)
  To: Mark Nelson, Ben England, Jens Axboe
  Cc: Martin Steigerwald, fio@vger.kernel.org, Mark Nelson

You may want to consider using in place sort functions.  list.sort() is more efficient than s=sorted(list) and will greatly reduce DRAM usage for large latency logs.  On Linux you can check the python DRAM usage with resource.getrusage(resource.RUSAGE_SELF).ru_maxrss, e.g. before and after your sort statements.

Regards,
Jeff


-----Original Message-----
From: fio-owner@vger.kernel.org [mailto:fio-owner@vger.kernel.org] On Behalf Of Mark Nelson
Sent: Tuesday, May 24, 2016 9:20 AM
To: Ben England <bengland@redhat.com>; Jens Axboe <axboe@kernel.dk>
Cc: Martin Steigerwald <ms@teamix.de>; fio@vger.kernel.org; Mark Nelson <mnelson@redhat.com>
Subject: Re: fiologparser.py

I've got a version that removes the dependency and appears to return the same values:

https://github.com/axboe/fio/pull/181

Going through the code though, it looks like the -A values are computed differently than in the other original functions.  In the original get_contribution function, all samples within the bounds are counted, along with samples that are only partially within the bounds.  Each sample is weighted based on the duration it overlapped with the sample
period:

https://github.com/axboe/fio/blob/master/tools/fiologparser.py#L195-L198

for -A, only the samples that are totally within the bounds are counted, and are weighted equally despite how much of the period was spent in that sample:

https://github.com/axboe/fio/blob/master/tools/fiologparser.py#L173

Thus if you look at say the average from -a:

fiologparser.py -a *clat*

1000, 11582.770
2000, 14033.844
3000, 17087.446
4000, 17946.245
5000, 14554.196
6000, 14407.804
7000, 15218.106
8000, 15157.951

the results are quite a bit different from -A:

fiologparser.py -A *clat* | tr -s "," " " | cut -f1,4 -d" "

0.000000 11902.719298
1000.000000 13247.750000
2000.000000 14270.549020
3000.000000 15092.192308
4000.000000 14127.472727
5000.000000 12880.137931
6000.000000 15296.735849
7000.000000 14857.306122
8000.000000 14854.766667

Mark


On 05/24/2016 10:35 AM, Ben England wrote:
> OK we'll remove the dependencies, I still want to have the -A option supported.
> -ben
>
> ----- Original Message -----
>> From: "Jens Axboe" <axboe@kernel.dk>
>> To: "Ben England" <bengland@redhat.com>, "Mark Nelson" 
>> <mark.a.nelson@gmail.com>
>> Cc: "Martin Steigerwald" <ms@teamix.de>, fio@vger.kernel.org, "Mark 
>> Nelson" <mnelson@redhat.com>
>> Sent: Tuesday, May 24, 2016 11:28:39 AM
>> Subject: Re: fiologparser.py
>>
>> On 05/24/2016 09:22 AM, Ben England wrote:
>>>
>>>
>>> ----- Original Message -----
>>>> From: "Mark Nelson" <mark.a.nelson@gmail.com>
>>>> To: "Ben England" <bengland@redhat.com>, "Martin Steigerwald"
>>>> <ms@teamix.de>
>>>> Cc: fio@vger.kernel.org, "Mark Nelson" <mnelson@redhat.com>, "Jens Axboe"
>>>> <axboe@kernel.dk>
>>>> Sent: Tuesday, May 24, 2016 10:04:14 AM
>>>> Subject: Re: fiologparser.py
>>>>
>>>> Let's see if we can remove the numpy and scipy dependencies.  It 
>>>> looks like we are just using it for 
>>>> min/average/median/max/percentile calculations.  It would be nice 
>>>> if users didn't need anything other than argparse.
>>>>
>>>
>>> Just curious, why is scipy a problem?  Is it because CBT isn't a 
>>> package so you don't get dependencies handled when you install it?  
>>> You are correct, it's easy to remove the dependencies, I just didn't 
>>> know it was causing problems for people.  You can get percentiles 
>>> from just sorting the sample values and indexing into the array at 
>>> the appropriate offset, I was just trying to re-use existing classes.
>>
>> It's not necessarily a problem, but the less dependencies you have, 
>> the easier it is for people to use. I do the same for fio, try to 
>> have as few external dependencies as possible. Remember, not 
>> everybody is running on Linux...
>>
>> --
>> Jens Axboe
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe fio" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fiologparser.py
  2016-05-24 16:20           ` fiologparser.py Mark Nelson
  2016-05-24 18:38             ` fiologparser.py Jeff Furlong
@ 2016-05-24 20:47             ` Ben England
  2016-05-25  2:04               ` fiologparser.py Mark Nelson
  1 sibling, 1 reply; 14+ messages in thread
From: Ben England @ 2016-05-24 20:47 UTC (permalink / raw)
  To: Mark Nelson; +Cc: Jens Axboe, Martin Steigerwald, fio, Mark Nelson

Mark, I didn't notice the sample weighting code before.  Weighting of samples might work for averaging, but it doesn't work for percentiles, min or max provided by -A option.  I guess for min this won't be an issue generally, since min-latency samples will probably fall entirely within a time interval.  But for max or higher percentiles it will *definitely* be an issue.   For example, a really high latency sample could be the max for a whole range of time intervals.   

To compute percentiles, we can sort (by response time) the samples that *overlap the time interval* and then index into the python list something like this (ignoring boundary conditions):

 def get_percentile(list, percentile): 
   return sample_list[len(list) * percentile / 100]

min would be first array element in sample_list, 
max would be last array element in sample_list. 

And I'll definitely try using .sort instead of sorted(), thx Jeff.

make sense?

-ben


----- Original Message -----
> From: "Mark Nelson" <mark.a.nelson@gmail.com>
> To: "Ben England" <bengland@redhat.com>, "Jens Axboe" <axboe@kernel.dk>
> Cc: "Martin Steigerwald" <ms@teamix.de>, fio@vger.kernel.org, "Mark Nelson" <mnelson@redhat.com>
> Sent: Tuesday, May 24, 2016 12:20:19 PM
> Subject: Re: fiologparser.py
> 
> I've got a version that removes the dependency and appears to return the
> same values:
> 
> https://github.com/axboe/fio/pull/181
> 
> Going through the code though, it looks like the -A values are computed
> differently than in the other original functions.  In the original
> get_contribution function, all samples within the bounds are counted,
> along with samples that are only partially within the bounds.  Each
> sample is weighted based on the duration it overlapped with the sample
> period:
> 
> https://github.com/axboe/fio/blob/master/tools/fiologparser.py#L195-L198
> 
> for -A, only the samples that are totally within the bounds are counted,
> and are weighted equally despite how much of the period was spent in
> that sample:
> 
> https://github.com/axboe/fio/blob/master/tools/fiologparser.py#L173
> 
> Thus if you look at say the average from -a:
> 
> fiologparser.py -a *clat*
> 
> 1000, 11582.770
> 2000, 14033.844
> 3000, 17087.446
> 4000, 17946.245
> 5000, 14554.196
> 6000, 14407.804
> 7000, 15218.106
> 8000, 15157.951
> 
> the results are quite a bit different from -A:
> 
> fiologparser.py -A *clat* | tr -s "," " " | cut -f1,4 -d" "
> 
> 0.000000 11902.719298
> 1000.000000 13247.750000
> 2000.000000 14270.549020
> 3000.000000 15092.192308
> 4000.000000 14127.472727
> 5000.000000 12880.137931
> 6000.000000 15296.735849
> 7000.000000 14857.306122
> 8000.000000 14854.766667
> 
> Mark
> 
> 
> On 05/24/2016 10:35 AM, Ben England wrote:
> > OK we'll remove the dependencies, I still want to have the -A option
> > supported.
> > -ben
> >
> > ----- Original Message -----
> >> From: "Jens Axboe" <axboe@kernel.dk>
> >> To: "Ben England" <bengland@redhat.com>, "Mark Nelson"
> >> <mark.a.nelson@gmail.com>
> >> Cc: "Martin Steigerwald" <ms@teamix.de>, fio@vger.kernel.org, "Mark
> >> Nelson" <mnelson@redhat.com>
> >> Sent: Tuesday, May 24, 2016 11:28:39 AM
> >> Subject: Re: fiologparser.py
> >>
> >> On 05/24/2016 09:22 AM, Ben England wrote:
> >>>
> >>>
> >>> ----- Original Message -----
> >>>> From: "Mark Nelson" <mark.a.nelson@gmail.com>
> >>>> To: "Ben England" <bengland@redhat.com>, "Martin Steigerwald"
> >>>> <ms@teamix.de>
> >>>> Cc: fio@vger.kernel.org, "Mark Nelson" <mnelson@redhat.com>, "Jens
> >>>> Axboe"
> >>>> <axboe@kernel.dk>
> >>>> Sent: Tuesday, May 24, 2016 10:04:14 AM
> >>>> Subject: Re: fiologparser.py
> >>>>
> >>>> Let's see if we can remove the numpy and scipy dependencies.  It looks
> >>>> like we are just using it for min/average/median/max/percentile
> >>>> calculations.  It would be nice if users didn't need anything other than
> >>>> argparse.
> >>>>
> >>>
> >>> Just curious, why is scipy a problem?  Is it because CBT isn't a
> >>> package so you don't get dependencies handled when you install it?  You
> >>> are correct, it's easy to remove the dependencies, I just didn't know it
> >>> was causing problems for people.  You can get percentiles from just
> >>> sorting the sample values and indexing into the array at the appropriate
> >>> offset, I was just trying to re-use existing classes.
> >>
> >> It's not necessarily a problem, but the less dependencies you have, the
> >> easier it is for people to use. I do the same for fio, try to have as
> >> few external dependencies as possible. Remember, not everybody is
> >> running on Linux...
> >>
> >> --
> >> Jens Axboe
> >>
> >>
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fiologparser.py
  2016-05-24 20:47             ` fiologparser.py Ben England
@ 2016-05-25  2:04               ` Mark Nelson
  2016-05-25  8:58                 ` fiologparser.py Martin Steigerwald
  0 siblings, 1 reply; 14+ messages in thread
From: Mark Nelson @ 2016-05-25  2:04 UTC (permalink / raw)
  To: Ben England; +Cc: Jens Axboe, Martin Steigerwald, fio, Mark Nelson



On 05/24/2016 03:47 PM, Ben England wrote:
> Mark, I didn't notice the sample weighting code before.  Weighting of samples might work for averaging, but it doesn't work for percentiles, min or max provided by -A option.  I guess for min this won't be an issue generally, since min-latency samples will probably fall entirely within a time interval.  But for max or higher percentiles it will *definitely* be an issue.   For example, a really high latency sample could be the max for a whole range of time intervals.

I went back and reworked the print and per-interval functions so that 
they are part of a Printer class and Interval class respectively.  It 
cleaned the code up pretty nicely.  I was also able to integrate the 
"-A" code to use a lot of the existing statistics and formatting code. 
It now supports the "-d" flag for example.

As part of that I took a stab at making a weighted implementation for 
percentiles (and as a result median).  The basic idea is to sort samples 
by value but then iterate over samples by weight to close in on the 
percentile boundary.  Once the samples that straddle the percentile 
boundary are found, take a weighted average of the two samples based 
inversely on their closeness to the boundary.

I do think it's really important to count samples with overlapping 
boundaries.  In the min case you otherwise disregard the min values that 
are spread over long time durations (ie when IOs stall).  In the max 
case, you potentially loose out on high throughput samples at edge 
boundaries.

I tried the old code and new code on a sample I had.  There's a pretty 
big difference in the number of samples utilized (or partially utilized) 
per interval.

old:

> start-time, samples, min, avg, median, 90%, 95%, 99%, max
> 0.000000, 8, 169631.000000, 321862.500000, 363155.000000, 417325.500000, 418426.250000, 419306.850000, 419527.000000
> 1000.000000, 8, 217273.000000, 324114.750000, 262548.000000, 449062.800000, 456610.900000, 462649.380000, 464159.000000
> 2000.000000, 8, 252437.000000, 351356.000000, 309912.500000, 468551.400000, 470426.700000, 471926.940000, 472302.000000
> 3000.000000, 8, 147123.000000, 315987.375000, 295690.500000, 451860.200000, 457549.100000, 462100.220000, 463238.000000
> 4000.000000, 8, 152847.000000, 325890.875000, 352656.000000, 442708.300000, 446184.150000, 448964.830000, 449660.000000
> 5000.000000, 7, 152547.000000, 333048.428571, 285577.000000, 465428.800000, 469807.900000, 473311.180000, 474187.000000

New:

> end-time, samples, min, avg, median, 90%, 95%, 99%, max
> 1000.000, 16, 169631.000, 321863.134, 298029.136, 451210.153, 455823.097, 457210.922, 457836.000
> 2000.000, 24, 184826.000, 341609.250, 285337.006, 462780.936, 465093.032, 465706.770, 466011.000
> 3000.000, 24, 88867.000, 312228.872, 298560.686, 466730.845, 469928.578, 471566.768, 472302.000
> 4000.000, 24, 88867.000, 309359.155, 278879.166, 458966.926, 462427.823, 462987.178, 463238.000
> 5000.000, 24, 137593.000, 326864.166, 317893.305, 449518.978, 455424.867, 459333.936, 461407.000
> 6000.000, 23, 131237.000, 340960.370, 319615.167, 460959.116, 468513.304, 472427.275, 474187.000

Code is here if anyone wants to critique/flame:

https://github.com/markhpc/fio/commit/19943e4dce34233bc776ed868d12c4c03b5f98ec

Mark

>
> To compute percentiles, we can sort (by response time) the samples that *overlap the time interval* and then index into the python list something like this (ignoring boundary conditions):
>
>  def get_percentile(list, percentile):
>    return sample_list[len(list) * percentile / 100]
>
> min would be first array element in sample_list,
> max would be last array element in sample_list.
>
> And I'll definitely try using .sort instead of sorted(), thx Jeff.
>
> make sense?
>
> -ben
>
>
> ----- Original Message -----
>> From: "Mark Nelson" <mark.a.nelson@gmail.com>
>> To: "Ben England" <bengland@redhat.com>, "Jens Axboe" <axboe@kernel.dk>
>> Cc: "Martin Steigerwald" <ms@teamix.de>, fio@vger.kernel.org, "Mark Nelson" <mnelson@redhat.com>
>> Sent: Tuesday, May 24, 2016 12:20:19 PM
>> Subject: Re: fiologparser.py
>>
>> I've got a version that removes the dependency and appears to return the
>> same values:
>>
>> https://github.com/axboe/fio/pull/181
>>
>> Going through the code though, it looks like the -A values are computed
>> differently than in the other original functions.  In the original
>> get_contribution function, all samples within the bounds are counted,
>> along with samples that are only partially within the bounds.  Each
>> sample is weighted based on the duration it overlapped with the sample
>> period:
>>
>> https://github.com/axboe/fio/blob/master/tools/fiologparser.py#L195-L198
>>
>> for -A, only the samples that are totally within the bounds are counted,
>> and are weighted equally despite how much of the period was spent in
>> that sample:
>>
>> https://github.com/axboe/fio/blob/master/tools/fiologparser.py#L173
>>
>> Thus if you look at say the average from -a:
>>
>> fiologparser.py -a *clat*
>>
>> 1000, 11582.770
>> 2000, 14033.844
>> 3000, 17087.446
>> 4000, 17946.245
>> 5000, 14554.196
>> 6000, 14407.804
>> 7000, 15218.106
>> 8000, 15157.951
>>
>> the results are quite a bit different from -A:
>>
>> fiologparser.py -A *clat* | tr -s "," " " | cut -f1,4 -d" "
>>
>> 0.000000 11902.719298
>> 1000.000000 13247.750000
>> 2000.000000 14270.549020
>> 3000.000000 15092.192308
>> 4000.000000 14127.472727
>> 5000.000000 12880.137931
>> 6000.000000 15296.735849
>> 7000.000000 14857.306122
>> 8000.000000 14854.766667
>>
>> Mark
>>
>>
>> On 05/24/2016 10:35 AM, Ben England wrote:
>>> OK we'll remove the dependencies, I still want to have the -A option
>>> supported.
>>> -ben
>>>
>>> ----- Original Message -----
>>>> From: "Jens Axboe" <axboe@kernel.dk>
>>>> To: "Ben England" <bengland@redhat.com>, "Mark Nelson"
>>>> <mark.a.nelson@gmail.com>
>>>> Cc: "Martin Steigerwald" <ms@teamix.de>, fio@vger.kernel.org, "Mark
>>>> Nelson" <mnelson@redhat.com>
>>>> Sent: Tuesday, May 24, 2016 11:28:39 AM
>>>> Subject: Re: fiologparser.py
>>>>
>>>> On 05/24/2016 09:22 AM, Ben England wrote:
>>>>>
>>>>>
>>>>> ----- Original Message -----
>>>>>> From: "Mark Nelson" <mark.a.nelson@gmail.com>
>>>>>> To: "Ben England" <bengland@redhat.com>, "Martin Steigerwald"
>>>>>> <ms@teamix.de>
>>>>>> Cc: fio@vger.kernel.org, "Mark Nelson" <mnelson@redhat.com>, "Jens
>>>>>> Axboe"
>>>>>> <axboe@kernel.dk>
>>>>>> Sent: Tuesday, May 24, 2016 10:04:14 AM
>>>>>> Subject: Re: fiologparser.py
>>>>>>
>>>>>> Let's see if we can remove the numpy and scipy dependencies.  It looks
>>>>>> like we are just using it for min/average/median/max/percentile
>>>>>> calculations.  It would be nice if users didn't need anything other than
>>>>>> argparse.
>>>>>>
>>>>>
>>>>> Just curious, why is scipy a problem?  Is it because CBT isn't a
>>>>> package so you don't get dependencies handled when you install it?  You
>>>>> are correct, it's easy to remove the dependencies, I just didn't know it
>>>>> was causing problems for people.  You can get percentiles from just
>>>>> sorting the sample values and indexing into the array at the appropriate
>>>>> offset, I was just trying to re-use existing classes.
>>>>
>>>> It's not necessarily a problem, but the less dependencies you have, the
>>>> easier it is for people to use. I do the same for fio, try to have as
>>>> few external dependencies as possible. Remember, not everybody is
>>>> running on Linux...
>>>>
>>>> --
>>>> Jens Axboe
>>>>
>>>>
>>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fiologparser.py
  2016-05-24 15:22     ` fiologparser.py Ben England
  2016-05-24 15:28       ` fiologparser.py Jens Axboe
@ 2016-05-25  7:20       ` Martin Steigerwald
  1 sibling, 0 replies; 14+ messages in thread
From: Martin Steigerwald @ 2016-05-25  7:20 UTC (permalink / raw)
  To: Ben England; +Cc: Mark Nelson, fio, Mark Nelson, Jens Axboe

On Dienstag, 24. Mai 2016 11:22:06 CEST Ben England wrote:
> 
-- 
Martin Steigerwald  | Trainer

teamix GmbH
Südwestpark 43
90449 Nürnberg

Tel.:  +49 911 30999 55 | Fax: +49 911 30999 99
mail: martin.steigerwald@teamix.de | web:  http://www.teamix.de | blog: http://blog.teamix.de

Amtsgericht Nürnberg, HRB 18320 | Geschäftsführer: Oliver Kügow, Richard Müller

teamix Support Hotline: +49 911 30999-112
 
 Flexibilität im Haus – Sicherheit im Kopf, testen Sie jetzt 30 Tage kostenfrei unsere Cloud Backup Lösung FlexVault: www.teamix.de/cloud-backup 

----- Original Message -----
> 
> > From: "Mark Nelson" <mark.a.nelson@gmail.com>
> > To: "Ben England" <bengland@redhat.com>, "Martin Steigerwald"
> > <ms@teamix.de> Cc: fio@vger.kernel.org, "Mark Nelson"
> > <mnelson@redhat.com>, "Jens Axboe" <axboe@kernel.dk> Sent: Tuesday, May
> > 24, 2016 10:04:14 AM
> > Subject: Re: fiologparser.py
> > 
> > Let's see if we can remove the numpy and scipy dependencies.  It looks
> > like we are just using it for min/average/median/max/percentile
> > calculations.  It would be nice if users didn't need anything other than
> > argparse.
> 
> Just curious, why is scipy a problem?  Is it because CBT isn't a package so
> you don't get dependencies handled when you install it?  You are correct,
> it's easy to remove the dependencies, I just didn't know it was causing
> problems for people.  You can get percentiles from just sorting the sample
> values and indexing into the array at the appropriate offset, I was just
> trying to re-use existing classes.

There is a very clear reason and that is:

After this operation, 37.9 MB of additional disk space will be used.

Yes, thats right, that is the size of the python-scipy package in Debian 
Unstable.

So I am glad to intend to remove the dependency, as I really would like to 
move fiologparser to /usr/bin, but that means currently python-scipy would 
need to become a hard dependency and I am probably not going to do that.

So for fiologparser to move into a more promiment location, I´d need it to be 
free of that dependency.

Thank you,


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fiologparser.py
  2016-05-25  2:04               ` fiologparser.py Mark Nelson
@ 2016-05-25  8:58                 ` Martin Steigerwald
  0 siblings, 0 replies; 14+ messages in thread
From: Martin Steigerwald @ 2016-05-25  8:58 UTC (permalink / raw)
  To: Mark Nelson; +Cc: Ben England, Jens Axboe, fio, Mark Nelson

On Dienstag, 24. Mai 2016 21:04:13 CEST Mark Nelson wrote:
> On 05/24/2016 03:47 PM, Ben England wrote:
> > Mark, I didn't notice the sample weighting code before.  Weighting of
> > samples might work for averaging, but it doesn't work for percentiles,
> > min or max provided by -A option.  I guess for min this won't be an issue
> > generally, since min-latency samples will probably fall entirely within a
> > time interval.  But for max or higher percentiles it will *definitely* be
> > an issue.   For example, a really high latency sample could be the max
> > for a whole range of time intervals.
> I went back and reworked the print and per-interval functions so that
> they are part of a Printer class and Interval class respectively.  It
> cleaned the code up pretty nicely.  I was also able to integrate the
> "-A" code to use a lot of the existing statistics and formatting code.
> It now supports the "-d" flag for example.

While at it, Mark, could you also do the renaming to fiologparser?

Otherwise I´d prepare a patch, but then we risk a trivial merge conflict :)

Thanks,
Martin


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-05-25  8:58 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-24 10:35 fiologparser.py Martin Steigerwald
2016-05-24 11:12 ` fiologparser.py Ben England
2016-05-24 14:04   ` fiologparser.py Mark Nelson
2016-05-24 14:11     ` fiologparser.py Jens Axboe
2016-05-24 14:17       ` fiologparser.py Mark Nelson
2016-05-24 15:22     ` fiologparser.py Ben England
2016-05-24 15:28       ` fiologparser.py Jens Axboe
2016-05-24 15:35         ` fiologparser.py Ben England
2016-05-24 16:20           ` fiologparser.py Mark Nelson
2016-05-24 18:38             ` fiologparser.py Jeff Furlong
2016-05-24 20:47             ` fiologparser.py Ben England
2016-05-25  2:04               ` fiologparser.py Mark Nelson
2016-05-25  8:58                 ` fiologparser.py Martin Steigerwald
2016-05-25  7:20       ` fiologparser.py Martin Steigerwald

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.