* Who does determine the number of requests that can be serving simultaneously in a storage? @ 2011-01-07 3:21 Yuehai Xu 2011-01-07 5:16 ` Yuehai Xu 2011-01-07 8:21 ` Jens Axboe 0 siblings, 2 replies; 8+ messages in thread From: Yuehai Xu @ 2011-01-07 3:21 UTC (permalink / raw) To: linux-kernel; +Cc: axboe, cmm, rwheeler, vgoyal, czoccolo, yhxu Hi all, We know that couples of requests can be serving simultaneously in a storage because of NCQ. My question is that who does determine the exact number of the servicing requests in HDD or SSD? Since the capability for different storages(hdd/ssd) to server multiple requests is different, how the OS know the exact number of requests that can be served simultaneously? I fail to figure out the answer. I know the dispatch routine in I/O schedulers is elevator_dispatch_fn, which are invoked at two places. One is in __elv_next_request(), the other is elv_drain_elevator(). I fail to figure out the exact condition to trigger the routine of elv_drain_elevator(), from the source code, I know that it should dispatch all the requests in pending queue to "request_queue", from which the request is selected to dispatch into device driver. For __elv_next_request(), it is actually invoked by blk_peek_reqeust(), which is invoked by blk_fetch_request(). From their comments, I know that only a request should be fetched from "request_queue" and this request should be dispatched into corresponding device driver. However, I notice that blk_fetch_request is invoked at a number of places, it fetches the requests endlessly with different stop condition. Which condition is the exact one that control the number of requests which can be served at the same time? The OS would of course not dispatch requests more than that the storage can serve, for example, for SSD, the number of multi requests serving simultaneously might be 32, while for HDD, the number is 4. But how the OS handle this? Does different file system handle this differently? I appreciate any help. Thanks very much! Yuehai ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Who does determine the number of requests that can be serving simultaneously in a storage? 2011-01-07 3:21 Who does determine the number of requests that can be serving simultaneously in a storage? Yuehai Xu @ 2011-01-07 5:16 ` Yuehai Xu 2011-01-07 8:21 ` Jens Axboe 1 sibling, 0 replies; 8+ messages in thread From: Yuehai Xu @ 2011-01-07 5:16 UTC (permalink / raw) To: linux-kernel; +Cc: axboe, cmm, rwheeler, vgoyal, czoccolo, yhxu Hi all, I add a patch to kernel 2.6.35.7, in order to find out the number of pending/serving requests regards to SSD, Intel_M. The benchmark I use is postmark, access pattern is random small write. Below is the patch: diff -Nur orig/block/blk-core.c new/block/blk-core.c --- orig/block/blk-core.c 2011-01-06 23:57:39.000000000 -0500 +++ new/block/blk-core.c 2011-01-06 23:57:46.000000000 -0500 @@ -37,6 +37,10 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(block_rq_remap); EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_complete); +#define complete_log(queue, fmt, args...) \ + blk_add_trace_msg(queue, "yh " fmt, ##args) + static int __make_request(struct request_queue *q, struct bio *bio); /* @@ -1974,6 +1978,10 @@ if (!req->bio) return false; + complete_log(req->q, "nr_sorted: %u, in_flight[0]: %u, in_flight[1]: %u", + req->q->nr_sorted, req->q->in_flight[0], req->q->in_flight[1]); + trace_block_rq_complete(req->q, req); /* Here, I consider nr_sorted in "struct request_queue" as the number of pending requests, while in_flight[0/1] represent for the number of async/sync requests serving simultaneously in SSD. I think this little patch should exactly show the pending/serving number of requests. The result from blkparse shows that the number of serving requests in SSD is almost 1, while the pending number varies from tens to a hundred. Since I only run a single process for postmark, does that mean the number of requests in flight is always 1 even the storage is SSD? I have tested ext3/ext4/btrfs on cfq/deadline/noop, the numbers for requests in flight are the same, almost no more than 1. Thanks, Yuehai On Thu, Jan 6, 2011 at 10:21 PM, Yuehai Xu <yuehaixu@gmail.com> wrote: > Hi all, > > We know that couples of requests can be serving simultaneously in a > storage because of NCQ. My question is that who does determine the > exact number of the servicing requests in HDD or SSD? Since the > capability for different storages(hdd/ssd) to server multiple requests > is different, how the OS know the exact number of requests that can be > served simultaneously? > > I fail to figure out the answer. I know the dispatch routine in I/O > schedulers is elevator_dispatch_fn, which are invoked at two places. > One is in __elv_next_request(), the other is elv_drain_elevator(). I > fail to figure out the exact condition to trigger the routine of > elv_drain_elevator(), from the source code, I know that it should > dispatch all the requests in pending queue to "request_queue", from > which the request is selected to dispatch into device driver. > > For __elv_next_request(), it is actually invoked by > blk_peek_reqeust(), which is invoked by blk_fetch_request(). From > their comments, I know that only a request should be fetched from > "request_queue" and this request should be dispatched into > corresponding device driver. However, I notice that blk_fetch_request > is invoked at a number of places, it fetches the requests endlessly > with different stop condition. Which condition is the exact one that > control the number of requests which can be served at the same time? > The OS would of course not dispatch requests more than that the > storage can serve, for example, for SSD, the number of multi requests > serving simultaneously might be 32, while for HDD, the number is 4. > But how the OS handle this? > > Does different file system handle this differently? > > I appreciate any help. Thanks very much! > > Yuehai > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Who does determine the number of requests that can be serving simultaneously in a storage? 2011-01-07 3:21 Who does determine the number of requests that can be serving simultaneously in a storage? Yuehai Xu 2011-01-07 5:16 ` Yuehai Xu @ 2011-01-07 8:21 ` Jens Axboe 2011-01-07 13:00 ` Yuehai Xu 1 sibling, 1 reply; 8+ messages in thread From: Jens Axboe @ 2011-01-07 8:21 UTC (permalink / raw) To: Yuehai Xu; +Cc: linux-kernel, cmm, rwheeler, vgoyal, czoccolo, yhxu On 2011-01-07 04:21, Yuehai Xu wrote: > Hi all, > > We know that couples of requests can be serving simultaneously in a > storage because of NCQ. My question is that who does determine the > exact number of the servicing requests in HDD or SSD? Since the > capability for different storages(hdd/ssd) to server multiple requests > is different, how the OS know the exact number of requests that can be > served simultaneously? > > I fail to figure out the answer. I know the dispatch routine in I/O > schedulers is elevator_dispatch_fn, which are invoked at two places. > One is in __elv_next_request(), the other is elv_drain_elevator(). I > fail to figure out the exact condition to trigger the routine of > elv_drain_elevator(), from the source code, I know that it should > dispatch all the requests in pending queue to "request_queue", from > which the request is selected to dispatch into device driver. > > For __elv_next_request(), it is actually invoked by > blk_peek_reqeust(), which is invoked by blk_fetch_request(). From > their comments, I know that only a request should be fetched from > "request_queue" and this request should be dispatched into > corresponding device driver. However, I notice that blk_fetch_request > is invoked at a number of places, it fetches the requests endlessly > with different stop condition. Which condition is the exact one that > control the number of requests which can be served at the same time? > The OS would of course not dispatch requests more than that the > storage can serve, for example, for SSD, the number of multi requests > serving simultaneously might be 32, while for HDD, the number is 4. > But how the OS handle this? The driver has to take care of this. Since requests are pulled by the driver, it knows when to stop asking for more work. BTW, your depth of 4 for the HDD seems a bit odd. Typically all SATA drives share the same queue depth, limited by what NCQ provides (32). -- Jens Axboe ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Who does determine the number of requests that can be serving simultaneously in a storage? 2011-01-07 8:21 ` Jens Axboe @ 2011-01-07 13:00 ` Yuehai Xu 2011-01-07 13:10 ` Jens Axboe 0 siblings, 1 reply; 8+ messages in thread From: Yuehai Xu @ 2011-01-07 13:00 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-kernel, cmm, rwheeler, vgoyal, czoccolo, yhxu I add a tracepoint so that I can get nr_sorted and in_flight[0/1] of request_queue when request is completed, I consider nr_sorted as the number of pending requests and in_flight[0/1] represent the number serving in the storage. Does these two parameters stand for what I mean? The test benchmark I use is postmark which simulates the email server system, over 90% requests are small random write. The storage is Intel M SSD. Generally, I think the number of in_flight[0/1] should be much greater than 1, but the result shows that this value is almost 1 no matter what I/O scheduler(CFQ/DEADLINE/NOOP) or filesystem(EXT4/EXT3/BTRFS) it is. Is it normal? B.T.W, I only run a process for postmark. > The driver has to take care of this. Since requests are pulled by the > driver, it knows when to stop asking for more work. > > BTW, your depth of 4 for the HDD seems a bit odd. Typically all SATA > drives share the same queue depth, limited by what NCQ provides (32). > This number is given arbitrary, just for example. Thanks, Yuehai ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Who does determine the number of requests that can be serving simultaneously in a storage? 2011-01-07 13:00 ` Yuehai Xu @ 2011-01-07 13:10 ` Jens Axboe 2011-01-07 13:23 ` Yuehai Xu 0 siblings, 1 reply; 8+ messages in thread From: Jens Axboe @ 2011-01-07 13:10 UTC (permalink / raw) To: Yuehai Xu; +Cc: linux-kernel, cmm, rwheeler, vgoyal, czoccolo, yhxu Please don't top-post, thanks. On 2011-01-07 14:00, Yuehai Xu wrote: > I add a tracepoint so that I can get nr_sorted and in_flight[0/1] of > request_queue when request is completed, I consider nr_sorted as the > number of pending requests and in_flight[0/1] represent the number > serving in the storage. Does these two parameters stand for what I > mean? nr_sorted is the number of requests that reside in the IO scheduler. That means requests that are not on the dispatch list yet. in_flight is the number that the driver is currently handling. So I think your understanding is correct. If you look at where you added your trace point, there are already a trace point right there. I would recommend that you use blktrace, and then use btt to parse it. That will give you all sorts of queueing information. > The test benchmark I use is postmark which simulates the email server > system, over 90% requests are small random write. The storage is Intel > M SSD. Generally, I think the number of in_flight[0/1] should be much > greater than 1, but the result shows that this value is almost 1 no > matter what I/O scheduler(CFQ/DEADLINE/NOOP) or > filesystem(EXT4/EXT3/BTRFS) it is. Is it normal? Depends, do you have more requests pending in the IO scheduler? I'm assuming you already verified that NCQ is active and working for your drive. -- Jens Axboe ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Who does determine the number of requests that can be serving simultaneously in a storage? 2011-01-07 13:10 ` Jens Axboe @ 2011-01-07 13:23 ` Yuehai Xu 2011-01-07 15:30 ` Jens Axboe 0 siblings, 1 reply; 8+ messages in thread From: Yuehai Xu @ 2011-01-07 13:23 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-kernel, cmm, rwheeler, vgoyal, czoccolo, yhxu On Fri, Jan 7, 2011 at 8:10 AM, Jens Axboe <axboe@kernel.dk> wrote: > > Please don't top-post, thanks. I am really sorry for that. > > On 2011-01-07 14:00, Yuehai Xu wrote: >> I add a tracepoint so that I can get nr_sorted and in_flight[0/1] of >> request_queue when request is completed, I consider nr_sorted as the >> number of pending requests and in_flight[0/1] represent the number >> serving in the storage. Does these two parameters stand for what I >> mean? > > nr_sorted is the number of requests that reside in the IO scheduler. > That means requests that are not on the dispatch list yet. in_flight is > the number that the driver is currently handling. So I think your > understanding is correct. > > If you look at where you added your trace point, there are already a > trace point right there. I would recommend that you use blktrace, and > then use btt to parse it. That will give you all sorts of queueing > information. Yes, but I notice that the original traces can't get nr_sorted and in_flight[0/1] directly, so I just add few lines. The result is from the blktrace, and I use blkparse to analysis it, it should be the same as what you said about btt(I don't know it, sorry about that). > >> The test benchmark I use is postmark which simulates the email server >> system, over 90% requests are small random write. The storage is Intel >> M SSD. Generally, I think the number of in_flight[0/1] should be much >> greater than 1, but the result shows that this value is almost 1 no >> matter what I/O scheduler(CFQ/DEADLINE/NOOP) or >> filesystem(EXT4/EXT3/BTRFS) it is. Is it normal? > > Depends, do you have more requests pending in the IO scheduler? I'm > assuming you already verified that NCQ is active and working for your > drive. > Yes, the nr_sorted(num of pending requests) remains around 100. hdparm shows that the NCQ has been enabled. Thanks, Yuehai ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Who does determine the number of requests that can be serving simultaneously in a storage? 2011-01-07 13:23 ` Yuehai Xu @ 2011-01-07 15:30 ` Jens Axboe 2011-01-07 16:45 ` Yuehai Xu 0 siblings, 1 reply; 8+ messages in thread From: Jens Axboe @ 2011-01-07 15:30 UTC (permalink / raw) To: Yuehai Xu; +Cc: linux-kernel, cmm, rwheeler, vgoyal, czoccolo, yhxu On 2011-01-07 14:23, Yuehai Xu wrote: > On Fri, Jan 7, 2011 at 8:10 AM, Jens Axboe <axboe@kernel.dk> wrote: >> >> Please don't top-post, thanks. > > I am really sorry for that. > >> >> On 2011-01-07 14:00, Yuehai Xu wrote: >>> I add a tracepoint so that I can get nr_sorted and in_flight[0/1] of >>> request_queue when request is completed, I consider nr_sorted as the >>> number of pending requests and in_flight[0/1] represent the number >>> serving in the storage. Does these two parameters stand for what I >>> mean? >> >> nr_sorted is the number of requests that reside in the IO scheduler. >> That means requests that are not on the dispatch list yet. in_flight is >> the number that the driver is currently handling. So I think your >> understanding is correct. >> >> If you look at where you added your trace point, there are already a >> trace point right there. I would recommend that you use blktrace, and >> then use btt to parse it. That will give you all sorts of queueing >> information. > > Yes, but I notice that the original traces can't get nr_sorted and > in_flight[0/1] directly, so I just add few lines. The result is from > the blktrace, and I use blkparse to analysis it, it should be the same > as what you said about btt(I don't know it, sorry about that). You don't need those values. btt can just look at dispatch and completion events to get an exact queue depth number at any point in time. >>> The test benchmark I use is postmark which simulates the email server >>> system, over 90% requests are small random write. The storage is Intel >>> M SSD. Generally, I think the number of in_flight[0/1] should be much >>> greater than 1, but the result shows that this value is almost 1 no >>> matter what I/O scheduler(CFQ/DEADLINE/NOOP) or >>> filesystem(EXT4/EXT3/BTRFS) it is. Is it normal? >> >> Depends, do you have more requests pending in the IO scheduler? I'm >> assuming you already verified that NCQ is active and working for your >> drive. >> > > Yes, the nr_sorted(num of pending requests) remains around 100. hdparm > shows that the NCQ has been enabled. I would double check that NCQ really is active, not just supported. For instance, the controller needs to support it too. If you look at dmesg from when it detects your drive, it should print the queue depth used. Or you can check queue_depth in the sysfs scsi_device directory. It should be 31 (32 in total, but one has to be reserved for error handling) for NCQ enabled, or 1 if if isn't. -- Jens Axboe ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Who does determine the number of requests that can be serving simultaneously in a storage? 2011-01-07 15:30 ` Jens Axboe @ 2011-01-07 16:45 ` Yuehai Xu 0 siblings, 0 replies; 8+ messages in thread From: Yuehai Xu @ 2011-01-07 16:45 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-kernel, cmm, rwheeler, vgoyal, czoccolo, yhxu > You don't need those values. btt can just look at dispatch and > completion events to get an exact queue depth number at any point in > time. Cool tool, I need to learn it further. > I would double check that NCQ really is active, not just supported. For > instance, the controller needs to support it too. If you look at dmesg > from when it detects your drive, it should print the queue depth used. > Or you can check queue_depth in the sysfs scsi_device directory. It > should be 31 (32 in total, but one has to be reserved for error > handling) for NCQ enabled, or 1 if if isn't. > You are right, info from dmesg: [ 1.476660] ata4.00: 156301488 sectors, multi 16: LBA48 NCQ (depth 0/32) and queue_depth is 1. It proves that the NCQ of SSD is not actually activated. Error message(bash: /sys/block/sdb/device/queue_depth: Permission denied) is got when I "echo 31 > /sys/block/sdb/device/queue_depth" even though I am already with the privilege of root. Anyway, one thing is certain, it is because the NCQ of SSD is not activated, that the number of requests in flight is 1. I really appreciate your help, thanks very much! Yuehai ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-01-07 16:45 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-01-07 3:21 Who does determine the number of requests that can be serving simultaneously in a storage? Yuehai Xu 2011-01-07 5:16 ` Yuehai Xu 2011-01-07 8:21 ` Jens Axboe 2011-01-07 13:00 ` Yuehai Xu 2011-01-07 13:10 ` Jens Axboe 2011-01-07 13:23 ` Yuehai Xu 2011-01-07 15:30 ` Jens Axboe 2011-01-07 16:45 ` Yuehai Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox