* Events out of order
@ 2014-12-31 1:48 Phillip Susi
2014-12-31 1:51 ` Phillip Susi
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: Phillip Susi @ 2014-12-31 1:48 UTC (permalink / raw)
To: linux-btrace
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
I captured a trace with blktrace -d /dev/sde -o - > trace, then later
parsed it with blkparse -i - < trace, and the events are out of order.
It appears that they come in blocks from one cpu, then the other cpu,
and the timestamps roll backwards when switching to the other cpu.
Why aren't they in the proper order?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAEBCgAGBQJUo1XSAAoJENRVrw2cjl5RjS8H/2fiGBHhJ9PRwjAMEBAIqtgj
Oaz3BDI/ilA9+feGs39/YWrKJbgPr3p+1Sat6I7rxwt6sSPm8+O/NkwEQDZQAund
R5Urinh0Yov5I0mxUawhJs24YzrWiA8IfEdb7Nu772ihfKge6j32h734RvG9HSDh
SaSAOFAoAdDQx2aMR/AFdq9VlO2/dwpUgW2CxyehO341lHwpE6qjAITQ6aLtOJr9
lMAwTxifBdzsnGjafe/bawTqmwCj0ZA+SVTsDHJYFDcO1EpCo6llt3CG4bgXHXB0
Kk6RBHS+7aXlQybCPg5J2/jvfED0ohX4CFqGxuKU5t1+ZJKcYDCDZtd2rp7C6b4=qZgI
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 9+ messages in thread
* Events out of order
2014-12-31 1:48 Events out of order Phillip Susi
@ 2014-12-31 1:51 ` Phillip Susi
2014-12-31 1:53 ` Alireza Haghdoost
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Phillip Susi @ 2014-12-31 1:51 UTC (permalink / raw)
To: linux-btrace
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
I captured a trace with blktrace -d /dev/sde -d /dev/sdf -o - > trace,
then parsed it with blkparse -i - < trace, and I have noticed that the
events are out of order. They appear to come in batches from one cpu,
then the other, and when transitioning from one batch to the other,
the timestamps roll backwards.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAEBCgAGBQJUo1a3AAoJENRVrw2cjl5RPEYH/1q7V0fKvMXRtGWtaHrxkefp
Wzjyh8MekQL2gRROq0Fa7pjwUtGAsfuFW9K8RmFtJaNBemM+zWRXkF7vX6CPjVG+
IqO65M628b12A/GLmwPkK1vZ4qM/x2SE9h+JpUH068QUKQw+EyEVEIxrflC6al+S
LBY/mST0HOFog3dYOK7Bx6rKbRfyNOiw729xltq8Vpwn5HVHl3R3ZzvFir1ZV1VX
2foZquHEfFJsb9hHE4QZylBCqLE6txFnAW6PitqNtJyy8ow82Q+gyw2xSeTvxGJH
4KDWVbsTLYRihIf1dg+Uk7gEfIejO21H8Cb/O33RD7iHQO+uyuaE1UP9TNgyS6w=hj/D
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Events out of order
2014-12-31 1:48 Events out of order Phillip Susi
2014-12-31 1:51 ` Phillip Susi
@ 2014-12-31 1:53 ` Alireza Haghdoost
2014-12-31 3:23 ` Phillip Susi
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Alireza Haghdoost @ 2014-12-31 1:53 UTC (permalink / raw)
To: linux-btrace
On Tue, Dec 30, 2014 at 7:48 PM, Phillip Susi <psusi@ubuntu.com> wrote:
>
> I captured a trace with blktrace -d /dev/sde -o - > trace, then later
> parsed it with blkparse -i - < trace, and the events are out of order.
> It appears that they come in blocks from one cpu, then the other cpu,
> and the timestamps roll backwards when switching to the other cpu.
> Why aren't they in the proper order?
Can you please share an example ? How much time backward you observed
after switch ?
--
Alireza
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Events out of order
2014-12-31 1:48 Events out of order Phillip Susi
2014-12-31 1:51 ` Phillip Susi
2014-12-31 1:53 ` Alireza Haghdoost
@ 2014-12-31 3:23 ` Phillip Susi
2014-12-31 4:29 ` Alireza Haghdoost
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Phillip Susi @ 2014-12-31 3:23 UTC (permalink / raw)
To: linux-btrace
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
On 12/30/2014 08:53 PM, Alireza Haghdoost wrote:
> Can you please share an example ? How much time backward you
> observed after switch ?
8,64 2 606 0.019550771 480 D R 2816 + 128 [kworker/2:1H]
8,64 2 607 0.019551057 480 R R 2816 + 128 [0]
8,64 2 608 0.019551151 480 I R 2816 + 128 [kworker/2:1H]
8,64 2 609 0.019979263 0 D R 2816 + 128 [swapper/2]
8,64 2 610 0.019980728 0 D R 2944 + 128 [swapper/2]
8,64 2 611 0.020403185 0 C R 2816 + 128 [0]
8,64 2 612 0.020416690 0 R R 2944 + 128 [0]
8,64 2 613 0.020416798 0 I R 2944 + 128 [swapper/2]
8,64 2 614 0.020420237 480 D R 2944 + 128 [kworker/2:1H]
8,64 2 615 0.020420511 480 R R 2944 + 128 [0]
8,80 2 2 0.000000042 27566 G R 0 + 8 [hdparm]
8,80 2 3 0.000001130 27566 P N [hdparm]
8,80 2 4 0.000003388 27566 Q R 8 + 8 [hdparm]
8,80 2 5 0.000004343 27566 M R 8 + 8 [hdparm]
8,80 2 6 0.000005793 27566 Q R 16 + 8 [hdparm]
8,80 2 7 0.000005979 27566 M R 16 + 8 [hdparm]
8,80 2 8 0.000006934 27566 Q R 24 + 8 [hdparm]
8,80 2 9 0.000007092 27566 M R 24 + 8 [hdparm]
Come to think of it, in this example it didn't even switch to the
other cpu, just the other device ( though typically when that happens
it seems to also switch the cpu ).
Here is one that is worse:
8,64 1 1026 0.075645831 27565 Q R 7800 + 8 [hdparm]
8,64 1 1027 0.075646030 27565 M R 7800 + 8 [hdparm]
8,64 1 1028 0.075646911 27565 Q R 7808 + 8 [hdparm]
8,64 1 1029 0.075647447 27565 G R 7808 + 8 [hdparm]
8,80 0 1 0.035128312 27566 Q R 5376 + 8 [hdparm]
8,80 0 2 0.035130922 27566 G R 5376 + 8 [hdparm]
8,80 0 3 0.035131724 27566 P N [hdparm]
Would you like the raw trace file?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAEBCgAGBQJUo2wwAAoJENRVrw2cjl5RIooH/juX4p9Cz9XLjYbpLL0HtZhb
JnGw4yaZQEBeqBs5h1LL3A6JZnNHx/yAxMO6LteWr4aQf+nKWUyLD+Yj0jid3n+u
MF4yAwYI7PtJZtRzYpY0YlCorQdX2GKj1ajcX4mCHXrmMr2AXr4Ionyek042Q/Lp
cpJRJUL7OfHHhtpTRTf5Xg36yLo/nK1cIFU0bNz21U3o7jWKRKo7PAmulZymCdN0
4FULblVSqiC3iqWPFQC8ScYLxZa0uKnsA73hFB4NbVXVXOoDor2DbgiQoF8gKVWn
MmYlMwf5wfBwuPDTaTcVVLNA7weK6Gs79T71/6FLiJGnpaeedDAQyKvYx3j+5PQ=nd6f
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Events out of order
2014-12-31 1:48 Events out of order Phillip Susi
` (2 preceding siblings ...)
2014-12-31 3:23 ` Phillip Susi
@ 2014-12-31 4:29 ` Alireza Haghdoost
2014-12-31 14:36 ` Phillip Susi
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Alireza Haghdoost @ 2014-12-31 4:29 UTC (permalink / raw)
To: linux-btrace
Seems to me like you don't run blktrace and blkparse appropriately.
blktrace stores IO trace in multiple files where each file is
associated to a CPU core. However, you force the output of blktrace to
redirect into a single file. This make some timing Confusion for
blkparse in the next step. You are using live blkparse option (-i -)
inappropriately. If you want to run live blkparse, you need to feed it
through blktrace pipe. Like:
#blktrace -d /dev/sda -o - | blkparse -i -
If you want to store trace in a file then parse it offline, you need
to use --input option of blkparse.
Hope it helps
Alireza
On Tue, Dec 30, 2014 at 9:23 PM, Phillip Susi <psusi@ubuntu.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
> On 12/30/2014 08:53 PM, Alireza Haghdoost wrote:
>> Can you please share an example ? How much time backward you
>> observed after switch ?
>
> 8,64 2 606 0.019550771 480 D R 2816 + 128 [kworker/2:1H]
> 8,64 2 607 0.019551057 480 R R 2816 + 128 [0]
> 8,64 2 608 0.019551151 480 I R 2816 + 128 [kworker/2:1H]
> 8,64 2 609 0.019979263 0 D R 2816 + 128 [swapper/2]
> 8,64 2 610 0.019980728 0 D R 2944 + 128 [swapper/2]
> 8,64 2 611 0.020403185 0 C R 2816 + 128 [0]
> 8,64 2 612 0.020416690 0 R R 2944 + 128 [0]
> 8,64 2 613 0.020416798 0 I R 2944 + 128 [swapper/2]
> 8,64 2 614 0.020420237 480 D R 2944 + 128 [kworker/2:1H]
> 8,64 2 615 0.020420511 480 R R 2944 + 128 [0]
> 8,80 2 2 0.000000042 27566 G R 0 + 8 [hdparm]
> 8,80 2 3 0.000001130 27566 P N [hdparm]
> 8,80 2 4 0.000003388 27566 Q R 8 + 8 [hdparm]
> 8,80 2 5 0.000004343 27566 M R 8 + 8 [hdparm]
> 8,80 2 6 0.000005793 27566 Q R 16 + 8 [hdparm]
> 8,80 2 7 0.000005979 27566 M R 16 + 8 [hdparm]
> 8,80 2 8 0.000006934 27566 Q R 24 + 8 [hdparm]
> 8,80 2 9 0.000007092 27566 M R 24 + 8 [hdparm]
>
> Come to think of it, in this example it didn't even switch to the
> other cpu, just the other device ( though typically when that happens
> it seems to also switch the cpu ).
>
> Here is one that is worse:
>
> 8,64 1 1026 0.075645831 27565 Q R 7800 + 8 [hdparm]
> 8,64 1 1027 0.075646030 27565 M R 7800 + 8 [hdparm]
> 8,64 1 1028 0.075646911 27565 Q R 7808 + 8 [hdparm]
> 8,64 1 1029 0.075647447 27565 G R 7808 + 8 [hdparm]
> 8,80 0 1 0.035128312 27566 Q R 5376 + 8 [hdparm]
> 8,80 0 2 0.035130922 27566 G R 5376 + 8 [hdparm]
> 8,80 0 3 0.035131724 27566 P N [hdparm]
>
> Would you like the raw trace file?
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
>
> iQEcBAEBCgAGBQJUo2wwAAoJENRVrw2cjl5RIooH/juX4p9Cz9XLjYbpLL0HtZhb
> JnGw4yaZQEBeqBs5h1LL3A6JZnNHx/yAxMO6LteWr4aQf+nKWUyLD+Yj0jid3n+u
> MF4yAwYI7PtJZtRzYpY0YlCorQdX2GKj1ajcX4mCHXrmMr2AXr4Ionyek042Q/Lp
> cpJRJUL7OfHHhtpTRTf5Xg36yLo/nK1cIFU0bNz21U3o7jWKRKo7PAmulZymCdN0
> 4FULblVSqiC3iqWPFQC8ScYLxZa0uKnsA73hFB4NbVXVXOoDor2DbgiQoF8gKVWn
> MmYlMwf5wfBwuPDTaTcVVLNA7weK6Gs79T71/6FLiJGnpaeedDAQyKvYx3j+5PQ> =nd6f
> -----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Events out of order
2014-12-31 1:48 Events out of order Phillip Susi
` (3 preceding siblings ...)
2014-12-31 4:29 ` Alireza Haghdoost
@ 2014-12-31 14:36 ` Phillip Susi
2014-12-31 14:58 ` Chris Mason
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Phillip Susi @ 2014-12-31 14:36 UTC (permalink / raw)
To: linux-btrace
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 12/30/2014 11:29 PM, Alireza Haghdoost wrote:
> Seems to me like you don't run blktrace and blkparse
> appropriately. blktrace stores IO trace in multiple files where
> each file is associated to a CPU core. However, you force the
> output of blktrace to redirect into a single file. This make some
> timing Confusion for blkparse in the next step. You are using live
> blkparse option (-i -) inappropriately. If you want to run live
> blkparse, you need to feed it through blktrace pipe. Like:
> #blktrace -d /dev/sda -o - | blkparse -i -
>
> If you want to store trace in a file then parse it offline, you
> need to use --input option of blkparse.
Whether it is piped directly, or buffered in a file on disk makes no
difference; blkparse sees exactly the same input byte for byte, and
produces the same output. The man page recommends this interactive
method and doesn't mention that it borks up the ordering, but I
suppose I can try it the other way and see if that helps. If the
interactive method doesn't work correctly it should either be fixed or
the man page should mention this and not recommend its use.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
iQEcBAEBAgAGBQJUpAn0AAoJENRVrw2cjl5Rt98H/R+QLa7VQ2uWIv2yJmUPQDX/
9+qExZ1lEpJ39dtzDVZJwVISNQvUtEUA42hzPetuO3NQjiamThsh4koEw2cvr46G
m809gpMy9LifkxqG8ufE+sp9FAReoRhZ/9wjFWiMcWLP4JnmssRnJCs953PzVYNF
i6kYeRXqKJS8kWE1j9lXPtFCDhjhBRVoHCevBa3IxdP5Ri83YpQdLscvoXjSNSwe
NOiJloFO0tvbO7Lj5UY1rhB8ZNLxyGqSHeiEMUWiqTZONffEBoDmNiS6om36Qpjp
2kim1Ug838a61AgGkZHR8m4CRoUMLKDprG9mKOGlZKH8pIp9Y5iO0Md17OiY48c=VgcL
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Events out of order
2014-12-31 1:48 Events out of order Phillip Susi
` (4 preceding siblings ...)
2014-12-31 14:36 ` Phillip Susi
@ 2014-12-31 14:58 ` Chris Mason
2014-12-31 15:09 ` Phillip Susi
2014-12-31 15:19 ` Jens Axboe
7 siblings, 0 replies; 9+ messages in thread
From: Chris Mason @ 2014-12-31 14:58 UTC (permalink / raw)
To: linux-btrace
On Wed, Dec 31, 2014 at 9:36 AM, Phillip Susi <psusi@ubuntu.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 12/30/2014 11:29 PM, Alireza Haghdoost wrote:
>> Seems to me like you don't run blktrace and blkparse
>> appropriately. blktrace stores IO trace in multiple files where
>> each file is associated to a CPU core. However, you force the
>> output of blktrace to redirect into a single file. This make some
>> timing Confusion for blkparse in the next step. You are using live
>> blkparse option (-i -) inappropriately. If you want to run live
>> blkparse, you need to feed it through blktrace pipe. Like:
>> #blktrace -d /dev/sda -o - | blkparse -i -
>>
>> If you want to store trace in a file then parse it offline, you
>> need to use --input option of blkparse.
>
> Whether it is piped directly, or buffered in a file on disk makes no
> difference; blkparse sees exactly the same input byte for byte, and
> produces the same output. The man page recommends this interactive
> method and doesn't mention that it borks up the ordering, but I
> suppose I can try it the other way and see if that helps. If the
> interactive method doesn't work correctly it should either be fixed or
> the man page should mention this and not recommend its use.
It's been a while since I looked at this, but looks like blkparse has a
pipeline mode and a more exact mode. In pipeline mode, it's really
trying to get data to the screen quickly and without using a huge
amount of resources. In order to put things in order, it would need to
buffer the entire stream before outputting anything. So instead it
works in chunks, which is what you're seeing.
It enables pipeline mode when the input file from -i is a pipe. You'll
get better results if you allow blktrace to output into a collection of
files and pass the prefix to blkparse.
instead of blktrace -o - use blktrace -o prefix ; blkparse -i prefix
For multiple devices, iowatcher uses a directory to store all of them,
which is easier.
If you really want a single file result, take the per-cpu files from
blktrace, run them through blkparse and add -d dumpfile, which will
sort and pack all the results into a single binary file. It's
compressable and you can then delete all the per-cpu files.
-chris
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Events out of order
2014-12-31 1:48 Events out of order Phillip Susi
` (5 preceding siblings ...)
2014-12-31 14:58 ` Chris Mason
@ 2014-12-31 15:09 ` Phillip Susi
2014-12-31 15:19 ` Jens Axboe
7 siblings, 0 replies; 9+ messages in thread
From: Phillip Susi @ 2014-12-31 15:09 UTC (permalink / raw)
To: linux-btrace
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 12/31/2014 9:58 AM, Chris Mason wrote:
> It's been a while since I looked at this, but looks like blkparse
> has a pipeline mode and a more exact mode. In pipeline mode, it's
> really trying to get data to the screen quickly and without using a
> huge amount of resources. In order to put things in order, it
> would need to buffer the entire stream before outputting anything.
> So instead it works in chunks, which is what you're seeing.
Makes sense; it would be nice if the man page mentioned this drawback.
> It enables pipeline mode when the input file from -i is a pipe.
> You'll get better results if you allow blktrace to output into a
> collection of files and pass the prefix to blkparse.
>
> instead of blktrace -o - use blktrace -o prefix ; blkparse -i
> prefix
>
> For multiple devices, iowatcher uses a directory to store all of
> them, which is easier.
>
> If you really want a single file result, take the per-cpu files
> from blktrace, run them through blkparse and add -d dumpfile, which
> will sort and pack all the results into a single binary file. It's
> compressable and you can then delete all the per-cpu files.
Thanks; I thought I read something like that at one point but couldn't
find it in the man page last night.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
iQEcBAEBAgAGBQJUpBGiAAoJENRVrw2cjl5R5EcH/AoLTOH7JzM62//2/V4Ely5U
OTnDkD4Aq2ydS7SQ3otidPnCJ46zCwlACZxJHknxM3RkLkXQUsqjXJ9J0l6Bitav
u06jZHyr07d9ysRexeLMpPJsdzFzrY41G8ZjTNFJJBGgNv/y8zlqT1gXZ8QjJYjw
XSsBVy+8D7yPJRAwX7eXbrfwKRfe16N+ujQC1DXnVOVKDKeP1inG7vRDdUuzjciA
63Gf8Kdk8E4Rlh9zyR8QWSB15Tp92/bLIeoJjZIoqsLlrRFoizREyPLpVXhg3Zhn
M6C+/H9cm+wB2KDiDESLm7uNyokYeAyQGbesj7/WTj/Lccv29drFMF9cCYLk4nk¿Zz
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Events out of order
2014-12-31 1:48 Events out of order Phillip Susi
` (6 preceding siblings ...)
2014-12-31 15:09 ` Phillip Susi
@ 2014-12-31 15:19 ` Jens Axboe
7 siblings, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2014-12-31 15:19 UTC (permalink / raw)
To: linux-btrace
On 12/31/2014 08:09 AM, Phillip Susi wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 12/31/2014 9:58 AM, Chris Mason wrote:
>> It's been a while since I looked at this, but looks like blkparse
>> has a pipeline mode and a more exact mode. In pipeline mode, it's
>> really trying to get data to the screen quickly and without using a
>> huge amount of resources. In order to put things in order, it
>> would need to buffer the entire stream before outputting anything.
>> So instead it works in chunks, which is what you're seeing.
>
> Makes sense; it would be nice if the man page mentioned this drawback.
>
>> It enables pipeline mode when the input file from -i is a pipe.
>> You'll get better results if you allow blktrace to output into a
>> collection of files and pass the prefix to blkparse.
>>
>> instead of blktrace -o - use blktrace -o prefix ; blkparse -i
>> prefix
>>
>> For multiple devices, iowatcher uses a directory to store all of
>> them, which is easier.
>>
>> If you really want a single file result, take the per-cpu files
>> from blktrace, run them through blkparse and add -d dumpfile, which
>> will sort and pack all the results into a single binary file. It's
>> compressable and you can then delete all the per-cpu files.
>
> Thanks; I thought I read something like that at one point but couldn't
> find it in the man page last night.
What Chris writes is completely correct. Pre-process the per-cpu files
and use that binary sorted trace file for the inspection (or replay) tools.
BTW, the behavior is documented in the README:
----
If you want to do live tracing, you can pipe the data between blktrace
and blkparse:
% blktrace -d <device> -o - | blkparse -i -
This has a small risk of displaying some traces a little out of sync,
since it will do batch sorts of input events.
----
But I agree, would not hurt to put this in the man pages too.
--
Jens Axboe
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-12-31 15:19 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-31 1:48 Events out of order Phillip Susi
2014-12-31 1:51 ` Phillip Susi
2014-12-31 1:53 ` Alireza Haghdoost
2014-12-31 3:23 ` Phillip Susi
2014-12-31 4:29 ` Alireza Haghdoost
2014-12-31 14:36 ` Phillip Susi
2014-12-31 14:58 ` Chris Mason
2014-12-31 15:09 ` Phillip Susi
2014-12-31 15:19 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).