* bad trace magic
@ 2008-11-19 11:38 Martin Peschke
2008-11-19 11:49 ` Jens Axboe
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: Martin Peschke @ 2008-11-19 11:38 UTC (permalink / raw)
To: linux-btrace
Hi,
I run across blktrace complaints about broken traces once in a while.
I have pimped up the debugging output in verify_trace():
--- bad trace magic ---
magic 0x00000000
sequence 0x00000000
time 0x0000000000000000
sector 0x0000000000000000
bytes 0x65617407 <--- begin of trace (trace magic)
action 0x000005b0
pid 0x0000029c
device 0x7c185ac0
cpu 0x00000000
error 0x0036
pdu_len 0x18c0
00000000 00000000 00000000 00000000 00000000 00000000
65617407 000005b0 0000029c 7c185ac0 00000000 003618c0
bad trace in /sys/kernel/debug/block/sdbm/trace1
Traces appear to get out of place by some bytes. The offset varies.
I have got a couple of questions:
Does blktrace stop processing of a trace stream in this case, doesn't
it?
Next I am going to dump the trace which precedes a broken trace.
Any better ideas how to debug this one?
Thanks,
Martin
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bad trace magic
2008-11-19 11:38 bad trace magic Martin Peschke
@ 2008-11-19 11:49 ` Jens Axboe
2008-11-19 12:43 ` Martin Peschke
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2008-11-19 11:49 UTC (permalink / raw)
To: linux-btrace
On Wed, Nov 19 2008, Martin Peschke wrote:
> Hi,
>
> I run across blktrace complaints about broken traces once in a while.
> I have pimped up the debugging output in verify_trace():
>
> --- bad trace magic ---
> magic 0x00000000
> sequence 0x00000000
> time 0x0000000000000000
> sector 0x0000000000000000
> bytes 0x65617407 <--- begin of trace (trace magic)
> action 0x000005b0
> pid 0x0000029c
> device 0x7c185ac0
> cpu 0x00000000
> error 0x0036
> pdu_len 0x18c0
>
> 00000000 00000000 00000000 00000000 00000000 00000000
> 65617407 000005b0 0000029c 7c185ac0 00000000 003618c0
>
> bad trace in /sys/kernel/debug/block/sdbm/trace1
>
> Traces appear to get out of place by some bytes. The offset varies.
>
>
> I have got a couple of questions:
>
> Does blktrace stop processing of a trace stream in this case, doesn't
> it?
>
> Next I am going to dump the trace which precedes a broken trace.
> Any better ideas how to debug this one?
I think Alan and I debugged something like this in the early days of
blktrace. Is your stream consistently off when this happens? Does the
sequence numbers jump?
The most likely culprit is a relay bug, and there's recently been some
fixing for that.
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bad trace magic
2008-11-19 11:38 bad trace magic Martin Peschke
2008-11-19 11:49 ` Jens Axboe
@ 2008-11-19 12:43 ` Martin Peschke
2008-11-19 12:44 ` Jens Axboe
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Martin Peschke @ 2008-11-19 12:43 UTC (permalink / raw)
To: linux-btrace
On Wed, 2008-11-19 at 12:49 +0100, Jens Axboe wrote:
> On Wed, Nov 19 2008, Martin Peschke wrote:
> > Hi,
> >
> > I run across blktrace complaints about broken traces once in a while.
> > I have pimped up the debugging output in verify_trace():
> > ...
> I think Alan and I debugged something like this in the early days of
> blktrace. Is your stream consistently off when this happens? Does the
> sequence numbers jump?
Well, consistently off? If it's off once, is there a next trace?
Sequence numbers appear to jump, from 183e8 to f1b1 in this case:
--- bad trace magic ---
magic 0x00000001
sequence 0x0001007a
time 0x000000000000fb40
sector 0x000000000005925e
bytes 0x65617407
action 0x0000f1b1
pid 0x00000773
device 0xec86baa8
cpu 0x00000000
error 0x007f
pdu_len 0x8430
00000001 0001007a 00000000 0000fb40 00000000 0005925e 65617407 0000f1b1
00000773 ec86baa8 00000000 007f8430
bad trace in /sys/kernel/debug/block/sdae/trace0
--- previous trace ---
magic 0x65617407
sequence 0x000183e8
time 0x00000815371024d2
sector 0x000000000052cdf0
bytes 0x00001000
action 0x09820008
pid 0x000019bf
device 0x041000e0
cpu 0x00000000
error 0x0000
pdu_len 0x0000
65617407 000183e8 00000815 371024d2 00000000 0052cdf0 00001000 09820008
000019bf 041000e0 00000000 00000000
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bad trace magic
2008-11-19 11:38 bad trace magic Martin Peschke
2008-11-19 11:49 ` Jens Axboe
2008-11-19 12:43 ` Martin Peschke
@ 2008-11-19 12:44 ` Jens Axboe
2008-11-19 15:00 ` Martin Peschke
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2008-11-19 12:44 UTC (permalink / raw)
To: linux-btrace
On Wed, Nov 19 2008, Martin Peschke wrote:
>
> On Wed, 2008-11-19 at 12:49 +0100, Jens Axboe wrote:
> > On Wed, Nov 19 2008, Martin Peschke wrote:
> > > Hi,
> > >
> > > I run across blktrace complaints about broken traces once in a while.
> > > I have pimped up the debugging output in verify_trace():
> > > ...
> > I think Alan and I debugged something like this in the early days of
> > blktrace. Is your stream consistently off when this happens? Does the
> > sequence numbers jump?
>
> Well, consistently off? If it's off once, is there a next trace?
How are you running blktrace? If you have the blktrace output files, you
can inspect what it would have done next. blkparse may not do more after
it detects a bad trace, though.
>
> Sequence numbers appear to jump, from 183e8 to f1b1 in this case:
>
> --- bad trace magic ---
> magic 0x00000001
> sequence 0x0001007a
> time 0x000000000000fb40
> sector 0x000000000005925e
> bytes 0x65617407
> action 0x0000f1b1
> pid 0x00000773
> device 0xec86baa8
> cpu 0x00000000
> error 0x007f
> pdu_len 0x8430
>
> 00000001 0001007a 00000000 0000fb40 00000000 0005925e 65617407 0000f1b1
> 00000773 ec86baa8 00000000 007f8430
So that's interesting. What byte offset did this happen at? And what is
your subbuffer size for blktrace?
>
> bad trace in /sys/kernel/debug/block/sdae/trace0
>
> --- previous trace ---
> magic 0x65617407
> sequence 0x000183e8
> time 0x00000815371024d2
> sector 0x000000000052cdf0
> bytes 0x00001000
> action 0x09820008
> pid 0x000019bf
> device 0x041000e0
> cpu 0x00000000
> error 0x0000
> pdu_len 0x0000
>
> 65617407 000183e8 00000815 371024d2 00000000 0052cdf0 00001000 09820008
> 000019bf 041000e0 00000000 00000000
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrace" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bad trace magic
2008-11-19 11:38 bad trace magic Martin Peschke
` (2 preceding siblings ...)
2008-11-19 12:44 ` Jens Axboe
@ 2008-11-19 15:00 ` Martin Peschke
2008-11-19 15:07 ` Jens Axboe
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Martin Peschke @ 2008-11-19 15:00 UTC (permalink / raw)
To: linux-btrace
On Wed, 2008-11-19 at 13:44 +0100, Jens Axboe wrote:
> How are you running blktrace?
blktrace -a issue -a drv_data -a complete -w 43200 -o
- /dev/sda ... /dev/sdz | blkiomon
> If you have the blktrace output files, you
> can inspect what it would have done next.
I am concerned about pdu_len being some random number then.
I don't think blktrace would find a valid trace after taking a random
number of bytes for a trace payload.
> > Sequence numbers appear to jump, from 183e8 to f1b1 in this case:
> >
> > --- bad trace magic ---
> > magic 0x00000001
> > sequence 0x0001007a
> > time 0x000000000000fb40
> > sector 0x000000000005925e
> > bytes 0x65617407
> > action 0x0000f1b1
> > pid 0x00000773
> > device 0xec86baa8
> > cpu 0x00000000
> > error 0x007f
> > pdu_len 0x8430
> >
> > 00000001 0001007a 00000000 0000fb40 00000000 0005925e 65617407 0000f1b1
> > 00000773 ec86baa8 00000000 007f8430
>
> So that's interesting. What byte offset did this happen at? And what is
> your subbuffer size for blktrace?
see tip_subbuf output (from flush_subbuf_file()) below
--- bad trace magic ---
magic 0x00017000
sequence 0x40010011
time 0x0000346f04100080
sector 0x0000000100000018
bytes 0x00000001
action 0x00010080
pid 0x00000000
device 0x0000bbea
cpu 0x00000000
error 0x0014
pdu_len 0x8d98
00017000 40010011 0000346f 04100080 00000001 00000018
00000001 00010080 00000000 0000bbea 00000000 00148d98
bad trace in /sys/kernel/debug/block/sdy/trace1
ts->buf 0x2004c5027c0, ts->len 0x108, ts->max_len 0x80000,
offset 0x48, t 0x2004c502808
--- previous trace ---
magic 0x65617407
sequence 0x0001a232
time 0x00000e5bb87cd096
sector 0x0000000000843c08
bytes 0x00001000
action 0x40010011
pid 0x00003467
device 0x04100080
cpu 0x00000001
error 0x0000
pdu_len 0x0018
65617407 0001a232 00000e5b b87cd096 00000000 00843c08
00001000 40010011 00003467 04100080 00000001 00000018
--- bad trace magic ---
magic 0x00001000
sequence 0x09410007
time 0x00003469008000c0
sector 0x0000000000000000
bytes 0x65617407
action 0x0000fae2
pid 0x00000e44
device 0xbbefaa91
cpu 0x00000000
error 0x008f
pdu_len 0xd358
00001000 09410007 00003469 008000c0 00000000 00000000
65617407 0000fae2 00000e44 bbefaa91 00000000 008fd358
bad trace in /sys/kernel/debug/block/sdm/trace0
ts->buf 0x20060280910, ts->len 0xc00, ts->max_len 0x80000,
offset 0xf0, t 0x20060280a00
--- previous trace ---
magic 0x65617407
sequence 0x0000fadd
time 0x65617407000067d8
sector 0x00000d9ba13e3b74
bytes 0x00000000
action 0x00500030
pid 0x00001000
device 0x40010011
cpu 0x00003466
error 0x0080
pdu_len 0x00c0
65617407 0000fadd 65617407 000067d8 00000d9b a13e3b74
00000000 00500030 00001000 40010011 00003466 008000c0
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bad trace magic
2008-11-19 11:38 bad trace magic Martin Peschke
` (3 preceding siblings ...)
2008-11-19 15:00 ` Martin Peschke
@ 2008-11-19 15:07 ` Jens Axboe
2008-11-19 17:41 ` Martin Peschke
2008-11-26 9:33 ` Martin Peschke
6 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2008-11-19 15:07 UTC (permalink / raw)
To: linux-btrace
On Wed, Nov 19 2008, Martin Peschke wrote:
>
> On Wed, 2008-11-19 at 13:44 +0100, Jens Axboe wrote:
> > How are you running blktrace?
>
> blktrace -a issue -a drv_data -a complete -w 43200 -o
> - /dev/sda ... /dev/sdz | blkiomon
Ah ok. How hard is this to reproduce? If you could try a non-pipe
approach, then it would be nice to see if it happens there as well.
> > If you have the blktrace output files, you
> > can inspect what it would have done next.
>
> I am concerned about pdu_len being some random number then.
> I don't think blktrace would find a valid trace after taking a random
> number of bytes for a trace payload.
You are right, it's definitely a rocky road...
> > > Sequence numbers appear to jump, from 183e8 to f1b1 in this case:
> > >
> > > --- bad trace magic ---
> > > magic 0x00000001
> > > sequence 0x0001007a
> > > time 0x000000000000fb40
> > > sector 0x000000000005925e
> > > bytes 0x65617407
> > > action 0x0000f1b1
> > > pid 0x00000773
> > > device 0xec86baa8
> > > cpu 0x00000000
> > > error 0x007f
> > > pdu_len 0x8430
> > >
> > > 00000001 0001007a 00000000 0000fb40 00000000 0005925e 65617407 0000f1b1
> > > 00000773 ec86baa8 00000000 007f8430
> >
> > So that's interesting. What byte offset did this happen at? And what is
> > your subbuffer size for blktrace?
>
> see tip_subbuf output (from flush_subbuf_file()) below
>
> --- bad trace magic ---
> magic 0x00017000
> sequence 0x40010011
> time 0x0000346f04100080
> sector 0x0000000100000018
> bytes 0x00000001
> action 0x00010080
> pid 0x00000000
> device 0x0000bbea
> cpu 0x00000000
> error 0x0014
> pdu_len 0x8d98
>
> 00017000 40010011 0000346f 04100080 00000001 00000018
> 00000001 00010080 00000000 0000bbea 00000000 00148d98
>
> bad trace in /sys/kernel/debug/block/sdy/trace1
>
> ts->buf 0x2004c5027c0, ts->len 0x108, ts->max_len 0x80000,
> offset 0x48, t 0x2004c502808
>
> --- previous trace ---
> magic 0x65617407
> sequence 0x0001a232
> time 0x00000e5bb87cd096
> sector 0x0000000000843c08
> bytes 0x00001000
> action 0x40010011
> pid 0x00003467
> device 0x04100080
> cpu 0x00000001
> error 0x0000
> pdu_len 0x0018
>
> 65617407 0001a232 00000e5b b87cd096 00000000 00843c08
> 00001000 40010011 00003467 04100080 00000001 00000018
>
>
> --- bad trace magic ---
> magic 0x00001000
> sequence 0x09410007
> time 0x00003469008000c0
> sector 0x0000000000000000
> bytes 0x65617407
> action 0x0000fae2
> pid 0x00000e44
> device 0xbbefaa91
> cpu 0x00000000
> error 0x008f
> pdu_len 0xd358
>
> 00001000 09410007 00003469 008000c0 00000000 00000000
> 65617407 0000fae2 00000e44 bbefaa91 00000000 008fd358
>
> bad trace in /sys/kernel/debug/block/sdm/trace0
>
> ts->buf 0x20060280910, ts->len 0xc00, ts->max_len 0x80000,
> offset 0xf0, t 0x20060280a00
>
> --- previous trace ---
> magic 0x65617407
> sequence 0x0000fadd
> time 0x65617407000067d8
> sector 0x00000d9ba13e3b74
> bytes 0x00000000
> action 0x00500030
> pid 0x00001000
> device 0x40010011
> cpu 0x00003466
> error 0x0080
> pdu_len 0x00c0
>
> 65617407 0000fadd 65617407 000067d8 00000d9b a13e3b74
> 00000000 00500030 00001000 40010011 00003466 008000c0
So you don't happen to have a stream of data we can inspect _after_ the
bad magic occured?
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bad trace magic
2008-11-19 11:38 bad trace magic Martin Peschke
` (4 preceding siblings ...)
2008-11-19 15:07 ` Jens Axboe
@ 2008-11-19 17:41 ` Martin Peschke
2008-11-26 9:33 ` Martin Peschke
6 siblings, 0 replies; 8+ messages in thread
From: Martin Peschke @ 2008-11-19 17:41 UTC (permalink / raw)
To: linux-btrace
On Wed, 2008-11-19 at 16:07 +0100, Jens Axboe wrote:
> On Wed, Nov 19 2008, Martin Peschke wrote:
> > On Wed, 2008-11-19 at 13:44 +0100, Jens Axboe wrote:
> > > How are you running blktrace?
> >
> > blktrace -a issue -a drv_data -a complete -w 43200 -o
> > - /dev/sda ... /dev/sdz | blkiomon
>
> Ah ok. How hard is this to reproduce?
It's not too hard. I get several bad traces within about half an hour.
> If you could try a non-pipe approach, then it would be nice to see if
> it happens there as well.
I have tried this for about 90 minutes:
blktrace -a complete -a issue -a
drv_data /dev/sda /dev/sdae /dev/sdaj /dev/sdao /dev/sdat /dev/sday
/dev/sdbc /dev/sdbh /dev/sdbm /dev/sdbr /dev/sdbw /dev/sdca
/dev/sdg /dev/sdl /dev/sdq /dev/sdv /dev/sdaa
No bad traces. Need to check whether verify_trace() was called at all...
For this test I had to reduce the number of devices traced due to
the oom-killer picking on blktrace.
Or did you have something else in mind?
> So you don't happen to have a stream of data we can inspect _after_
> the bad magic occured?
I was trying with the command line above.
Would be blktrace in network mode worth another try?
I have just run the my initial test case on a recent linux git tree.
Same issue, bad traces:
--- bad trace magic ---
magic 0x00000000
sequence 0x00000000
time 0x65617407000014d5 <--- first trace magic
sector 0x656174070000a75c <--- another trace magic?!
bytes 0x0000011e
action 0x473dbce5
pid 0x00000000
device 0x0095b5e0
cpu 0x00004000
error 0x0181
pdu_len 0x0008
00000000 00000000 65617407 000014d5 65617407 0000a75c
0000011e 473dbce5 00000000 0095b5e0 00004000 01810008
bad trace in /sys/kernel/debug/block/sdaa/trace0
no previous trace
ts->buf 0x0x81fa0610, ts->len 0x958, ts->max_len 0x80000, offset 0x0, t
0x0x81fa0610
Then I ran blkrawverify /dev/sdaa - no issue found.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bad trace magic
2008-11-19 11:38 bad trace magic Martin Peschke
` (5 preceding siblings ...)
2008-11-19 17:41 ` Martin Peschke
@ 2008-11-26 9:33 ` Martin Peschke
6 siblings, 0 replies; 8+ messages in thread
From: Martin Peschke @ 2008-11-26 9:33 UTC (permalink / raw)
To: linux-btrace
On Wed, 2008-11-19 at 16:07 +0100, Jens Axboe wrote:
> So you don't happen to have a stream of data we can inspect _after_ the
> bad magic occured?
Hi Jens,
any suggestions, a commmand line, for capturing that stream of data?
I have been looking into the kernel side of it, and blktrace/relay
appear to continue writing data when a bad trace has been detected in
user space.
Thanks,
Martin
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-11-26 9:33 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-19 11:38 bad trace magic Martin Peschke
2008-11-19 11:49 ` Jens Axboe
2008-11-19 12:43 ` Martin Peschke
2008-11-19 12:44 ` Jens Axboe
2008-11-19 15:00 ` Martin Peschke
2008-11-19 15:07 ` Jens Axboe
2008-11-19 17:41 ` Martin Peschke
2008-11-26 9:33 ` Martin Peschke
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox