linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] LatencyTop: make reason (blk_execute_rq+0x7a/0xd) known
@ 2008-01-19 16:54 Török Edwin
  2008-01-19 18:30 ` Arjan van de Ven
  0 siblings, 1 reply; 4+ messages in thread
From: Török Edwin @ 2008-01-19 16:54 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Linux Kernel

Hi Arjan,

One of the top reasons in latencytop is
Unknown reason (blk_execute_rq+0x7a/0xd)          152.3 msec        
49.0 msec

Its from hddtemp, hald-addon-storage, ...

I wrote a simple systemtap script [1] to find out where its called from,
and its definitely from the ioctl (see below).

1820322406
hddtemp(3999):                                                                                                      
<- latency of 153260223 ns
Returning from: 0xffffffff80311ab0 : blk_execute_rq+0x0/0xd0 []
Returning to  : 0xffffffff80315580 : sg_io+0x200/0x3c0 []
 0xffffffff80315a02 : scsi_cmd_ioctl+0x2c2/0x410 []
 0xffffffff80417153 : ip_queue_xmit+0x2b3/0x420 []
 0xffffffff880c3be9 : sd_ioctl+0x99/0xf0 [sd_mod]
 0xffffffff80313839 : blkdev_driver_ioctl+0x39/0x40 []
 0xffffffff80313ab8 : blkdev_ioctl+0x278/0x710 []
 0xffffffff803e540d : release_sock+0x8d/0xa0 []
 0xffffffff803e54b3 : lock_sock_nested+0x93/0xa0 []
 0xffffffff8029a4ff : d_alloc+0x6f/0x1a0 []
 0xffffffff802f102f : security_d_instantiate+0x1f/0x30 []
 0xffffffff803e540d : release_sock+0x8d/0xa0 []
 0xffffffff8043c8c8 : inet_accept+0x88/0xc0 []
 0xffffffff803e408e : move_addr_to_user+0x5e/0x70 []
 0xffffffff802f0bb1 : security_socket_post_accept+0x11/0x20 []
 0xffffffff803e472e : sys_accept+0x13e/0x1d0 []
 0xffffffff802af12b : block_ioctl+0x1b/0x20 []
 0xffffffff8029482b : do_ioctl+0x1b/0x70 []
 0xffffffff80294a6d : vfs_ioctl+0x1ed/0x290 []
 0xffffffff80296281 : sys_select+0x41/0x1b0 []
 0xffffffff80294b92 : sys_ioctl+0x82/0xa0 []
 0xffffffff8020c18e : system_call+0x7e/0x83 []


I suggest adding an annotation to make this reason known to latencytop
by annotating sd_ioctl/scsi_cmd_ioctl, see my patch below.
After my patch, instead of blk_execute_rq the real reason is showed:

Cause                                               Maximum          Average
SCSI device ioctl                                  34.2 msec        
14.4 msec

I noticed there's a similar annotation in sr.c for cdrom_ioctl, but sr.c
also has annotations for scsi_nonblockable_ioctl, and scsi_ioctl.
If they are needed for sd_block_ioctl too, maybe
scsi_nonblockable_ioctl/scsi_ioctl should have the annotation themselves.

I also noticed some unsigned overflows, my top latency was sometimes a
very large number (18................................), which
was probably -1, or some other negative number.  I haven't found a way
to reproduce it yet (it happens very rarely).

[1] Systemtap script:
#!/usr/bin/stap -v
global start
probe kernel.function("blk_execute_rq") {
        start[tid()] = gettimeofday_ns()
}

probe kernel.function("blk_execute_rq").return {
        # if we have not seen this before
        if (!([tid()] in start)) next
        delta = gettimeofday_ns() - start[tid()]
        # show backtraces for latencies over 1ms
        if(delta > 1000000) {
                printf("%s <- latency of %d ns\n", thread_indent(1), delta)
                print_backtrace ()
        }
        delete start[tid()]
}

P.S.: LatencyTop could have a feature to ask for more details on unknown
latency reasons, and it would generate a systemtap script itself, that
would show the backtraces to help figure out whats going on.

Best regards,
--Edwin

----

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index a69b155..93ccfc0 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -686,6 +687,7 @@ static int sd_ioctl(struct inode * inode, struct
file * filp,
     struct scsi_device *sdp = scsi_disk(disk)->device;
     void __user *p = (void __user *)arg;
     int error;
+    struct latency_entry reason;
    
     SCSI_LOG_IOCTL(1, printk("sd_ioctl: disk=%s, cmd=0x%x\n",
                         disk->disk_name, cmd));
@@ -710,7 +712,9 @@ static int sd_ioctl(struct inode * inode, struct
file * filp,
         case SCSI_IOCTL_GET_BUS_NUMBER:
             return scsi_ioctl(sdp, cmd, p);
         default:
+            set_latency_reason("SCSI device ioctl", &reason);
             error = scsi_cmd_ioctl(filp, disk->queue, disk, cmd, p);
+            restore_latency_reason(&reason);
             if (error != -ENOTTY)
                 return error;
     }

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] LatencyTop: make reason (blk_execute_rq+0x7a/0xd) known
  2008-01-19 16:54 [PATCH] LatencyTop: make reason (blk_execute_rq+0x7a/0xd) known Török Edwin
@ 2008-01-19 18:30 ` Arjan van de Ven
  2008-01-19 19:40   ` Frank Ch. Eigler
  2008-01-19 19:47   ` Török Edwin
  0 siblings, 2 replies; 4+ messages in thread
From: Arjan van de Ven @ 2008-01-19 18:30 UTC (permalink / raw)
  To: Török Edwin; +Cc: Linux Kernel

Török Edwin wrote:
> Hi Arjan,
> 
> One of the top reasons in latencytop is
> Unknown reason (blk_execute_rq+0x7a/0xd)          152.3 msec        
> 49.0 msec
> 
> Its from hddtemp, hald-addon-storage, ...
> 
> I wrote a simple systemtap script [1] to find out where its called from,
> and its definitely from the ioctl (see below).
> 
> 1820322406
> hddtemp(3999):                                                                                                      
> <- latency of 153260223 ns
> Returning from: 0xffffffff80311ab0 : blk_execute_rq+0x0/0xd0 []
> Returning to  : 0xffffffff80315580 : sg_io+0x200/0x3c0 []
>  0xffffffff80315a02 : scsi_cmd_ioctl+0x2c2/0x410 []
>  0xffffffff80417153 : ip_queue_xmit+0x2b3/0x420 []
>  0xffffffff880c3be9 : sd_ioctl+0x99/0xf0 [sd_mod]
>  0xffffffff80313839 : blkdev_driver_ioctl+0x39/0x40 []
>  0xffffffff80313ab8 : blkdev_ioctl+0x278/0x710 []
>  0xffffffff803e540d : release_sock+0x8d/0xa0 []
>  0xffffffff803e54b3 : lock_sock_nested+0x93/0xa0 []
>  0xffffffff8029a4ff : d_alloc+0x6f/0x1a0 []
>  0xffffffff802f102f : security_d_instantiate+0x1f/0x30 []
>  0xffffffff803e540d : release_sock+0x8d/0xa0 []
>  0xffffffff8043c8c8 : inet_accept+0x88/0xc0 []
>  0xffffffff803e408e : move_addr_to_user+0x5e/0x70 []
>  0xffffffff802f0bb1 : security_socket_post_accept+0x11/0x20 []
>  0xffffffff803e472e : sys_accept+0x13e/0x1d0 []
>  0xffffffff802af12b : block_ioctl+0x1b/0x20 []
>  0xffffffff8029482b : do_ioctl+0x1b/0x70 []
>  0xffffffff80294a6d : vfs_ioctl+0x1ed/0x290 []
>  0xffffffff80296281 : sys_select+0x41/0x1b0 []
>  0xffffffff80294b92 : sys_ioctl+0x82/0xa0 []
>  0xffffffff8020c18e : system_call+0x7e/0x83 []
> 
> 
> I suggest adding an annotation to make this reason known to latencytop
> by annotating sd_ioctl/scsi_cmd_ioctl, see my patch below.
> After my patch, instead of blk_execute_rq the real reason is showed:
> 
> Cause                                               Maximum          Average
> SCSI device ioctl                                  34.2 msec        
> 14.4 msec

great! I'll put this into my patchkit!

> 
> I noticed there's a similar annotation in sr.c for cdrom_ioctl, but sr.c
> also has annotations for scsi_nonblockable_ioctl, and scsi_ioctl.
> If they are needed for sd_block_ioctl too, maybe
> scsi_nonblockable_ioctl/scsi_ioctl should have the annotation themselves.

yeah I'll take a look at this; some of the annotations are still a bit iffy in their placement

> 
> I also noticed some unsigned overflows, my top latency was sometimes a
> very large number (18................................), which
> was probably -1, or some other negative number.  I haven't found a way
> to reproduce it yet (it happens very rarely).

I've not seen this; I'll take another peak at the code.

> P.S.: LatencyTop could have a feature to ask for more details on unknown
> latency reasons, and it would generate a systemtap script itself, that
> would show the backtraces to help figure out whats going on.

I've considered having an option inside the code itself that would print a
backtrace for, say, an unknown latency larger than 100msec
but the code got ugly (the accounting gets done in the scheduler path,
where you can't printk).
I'll try again ;-)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] LatencyTop: make reason (blk_execute_rq+0x7a/0xd) known
  2008-01-19 18:30 ` Arjan van de Ven
@ 2008-01-19 19:40   ` Frank Ch. Eigler
  2008-01-19 19:47   ` Török Edwin
  1 sibling, 0 replies; 4+ messages in thread
From: Frank Ch. Eigler @ 2008-01-19 19:40 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: =?ISO-8859-1?Q?T=F6r=F6k_Edwin?=, Linux Kernel

Arjan van de Ven <arjan@linux.intel.com> writes:

> Török Edwin wrote:
>> [...]
>> P.S.: LatencyTop could have a feature to ask for more details on unknown
>> latency reasons, and it would generate a systemtap script itself, that
>> would show the backtraces to help figure out whats going on.
>
> I've considered having an option inside the code itself that would print a
> backtrace for, say, an unknown latency larger than 100msec
> but the code got ugly (the accounting gets done in the scheduler path,
> where you can't printk). [...]

Systemtap does not use printk, and can generate decent backtraces from
the kernel (and into userspace before too long).  (FWIW, it seems to
be only a small stretch to implement tools like *top via systemtap
rather than custom-written kernel code.)

- FChE

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] LatencyTop: make reason (blk_execute_rq+0x7a/0xd) known
  2008-01-19 18:30 ` Arjan van de Ven
  2008-01-19 19:40   ` Frank Ch. Eigler
@ 2008-01-19 19:47   ` Török Edwin
  1 sibling, 0 replies; 4+ messages in thread
From: Török Edwin @ 2008-01-19 19:47 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Linux Kernel

Arjan van de Ven wrote:
> Török Edwin wrote:
>>
>> Cause                                               Maximum         
>> Average
>> SCSI device ioctl                                  34.2 msec       
>> 14.4 msec
>
> great! I'll put this into my patchkit!

Thanks.

>> I also noticed some unsigned overflows, my top latency was sometimes a
>> very large number (18................................), which
>> was probably -1, or some other negative number.  I haven't found a way
>> to reproduce it yet (it happens very rarely).
>
>
> I've not seen this; I'll take another peak at the code.

I just captured this:

Cause                                               Maximum          Average
Waiting for userspace lock                        18446744073638300.0
msec        9223372036783524.0 msec
Waiting for processes to die (waitpid)            18446744073638296.0
msec        9223372036783520.0 msec
Waiting for event (poll)                          18446744073565216.0
msec        764505121372.0 msec
SCSI device ioctl                                  35.2 msec        
13.7 msec
Application requested delay                        19.5 msec         
0.0 msec
page fault                                         18.6 msec         
4.0 msec
Reading from file                                  16.6 msec         
1.5 msec
Waiting for buffer IO                              15.2 msec         
3.0 msec
mutex lock                                         15.0 msec        
15.0 msec

I was also looking at /proc/latency_stats using watch, and I've seen
that there were negative number.
Unfortunately I couldn't copy+paste it, because it was gone too fast.

I was running a program with 4 threads doing usleep(0) in an inf-loop.
However I can't reproduce the overflow now with the same program.

Best regards,
--Edwin

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-01-19 19:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-19 16:54 [PATCH] LatencyTop: make reason (blk_execute_rq+0x7a/0xd) known Török Edwin
2008-01-19 18:30 ` Arjan van de Ven
2008-01-19 19:40   ` Frank Ch. Eigler
2008-01-19 19:47   ` Török Edwin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).