Who does determine the number of requests that can be serving simultaneously in a storage?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Who does determine the number of requests that can be serving simultaneously in a storage?
@ 2011-01-07  3:21 Yuehai Xu
  2011-01-07  5:16 ` Yuehai Xu
  2011-01-07  8:21 ` Jens Axboe
  0 siblings, 2 replies; 8+ messages in thread
From: Yuehai Xu @ 2011-01-07  3:21 UTC (permalink / raw)
  To: linux-kernel; +Cc: axboe, cmm, rwheeler, vgoyal, czoccolo, yhxu

Hi all,

We know that couples of requests can be serving simultaneously in a
storage because of NCQ. My question is that who does determine the
exact number of the servicing requests in HDD or SSD? Since the
capability for different storages(hdd/ssd) to server multiple requests
is different, how the OS know the exact number of requests that can be
served simultaneously?

I fail to figure out the answer. I know the dispatch routine in I/O
schedulers is elevator_dispatch_fn, which are invoked at two places.
One is in __elv_next_request(), the other is elv_drain_elevator(). I
fail to figure out the exact condition to trigger the routine of
elv_drain_elevator(), from the source code, I know that it should
dispatch all the requests in pending queue to "request_queue", from
which the request is selected to dispatch into device driver.

For __elv_next_request(), it is actually invoked by
blk_peek_reqeust(), which is invoked by blk_fetch_request(). From
their comments, I know that only a request should be fetched from
"request_queue" and this request should be dispatched into
corresponding device driver. However, I notice that blk_fetch_request
is invoked at a number of places, it fetches the requests endlessly
with different stop condition. Which condition is the exact one that
control the number of requests which can be served at the same time?
The OS would of course not dispatch requests more than that the
storage can serve, for example, for SSD, the number of multi requests
serving simultaneously might be 32, while for HDD, the number is 4.
But how the OS handle this?

Does different file system handle this differently?

I appreciate any help. Thanks very much!

Yuehai

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Who does determine the number of requests that can be serving simultaneously in a storage?
  2011-01-07  3:21 Who does determine the number of requests that can be serving simultaneously in a storage? Yuehai Xu
@ 2011-01-07  5:16 ` Yuehai Xu
  2011-01-07  8:21 ` Jens Axboe
  1 sibling, 0 replies; 8+ messages in thread
From: Yuehai Xu @ 2011-01-07  5:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: axboe, cmm, rwheeler, vgoyal, czoccolo, yhxu

Hi all,

I add a patch to kernel 2.6.35.7, in order to find out the number of
pending/serving requests regards to SSD, Intel_M. The benchmark I use
is postmark, access pattern is random small write. Below is the patch:

diff -Nur orig/block/blk-core.c new/block/blk-core.c
--- orig/block/blk-core.c       2011-01-06 23:57:39.000000000 -0500
+++ new/block/blk-core.c        2011-01-06 23:57:46.000000000 -0500
@@ -37,6 +37,10 @@
 EXPORT_TRACEPOINT_SYMBOL_GPL(block_rq_remap);
 EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_complete);

+#define complete_log(queue, fmt, args...)     \
+        blk_add_trace_msg(queue, "yh " fmt, ##args)
+
 static int __make_request(struct request_queue *q, struct bio *bio);

 /*
@@ -1974,6 +1978,10 @@
        if (!req->bio)
                return false;

+       complete_log(req->q, "nr_sorted: %u, in_flight[0]: %u,
in_flight[1]: %u",
+               req->q->nr_sorted, req->q->in_flight[0], req->q->in_flight[1]);
+
        trace_block_rq_complete(req->q, req);

        /*

Here, I consider nr_sorted in "struct request_queue" as the number of
pending requests, while in_flight[0/1] represent for the number of
async/sync requests serving simultaneously in SSD. I think this little
patch should exactly show the pending/serving number of requests.

The result from blkparse shows that the number of serving requests in
SSD is almost 1, while the pending number varies from tens to a
hundred. Since I only run a single process for postmark, does that
mean the number of requests in flight is always 1 even the storage is
SSD?

I have tested ext3/ext4/btrfs on cfq/deadline/noop, the numbers for
requests in flight are the same, almost no more than 1.

Thanks,
Yuehai

On Thu, Jan 6, 2011 at 10:21 PM, Yuehai Xu <yuehaixu@gmail.com> wrote:
> Hi all,
>
> We know that couples of requests can be serving simultaneously in a
> storage because of NCQ. My question is that who does determine the
> exact number of the servicing requests in HDD or SSD? Since the
> capability for different storages(hdd/ssd) to server multiple requests
> is different, how the OS know the exact number of requests that can be
> served simultaneously?
>
> I fail to figure out the answer. I know the dispatch routine in I/O
> schedulers is elevator_dispatch_fn, which are invoked at two places.
> One is in __elv_next_request(), the other is elv_drain_elevator(). I
> fail to figure out the exact condition to trigger the routine of
> elv_drain_elevator(), from the source code, I know that it should
> dispatch all the requests in pending queue to "request_queue", from
> which the request is selected to dispatch into device driver.
>
> For __elv_next_request(), it is actually invoked by
> blk_peek_reqeust(), which is invoked by blk_fetch_request(). From
> their comments, I know that only a request should be fetched from
> "request_queue" and this request should be dispatched into
> corresponding device driver. However, I notice that blk_fetch_request
> is invoked at a number of places, it fetches the requests endlessly
> with different stop condition. Which condition is the exact one that
> control the number of requests which can be served at the same time?
> The OS would of course not dispatch requests more than that the
> storage can serve, for example, for SSD, the number of multi requests
> serving simultaneously might be 32, while for HDD, the number is 4.
> But how the OS handle this?
>
> Does different file system handle this differently?
>
> I appreciate any help. Thanks very much!
>
> Yuehai
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Who does determine the number of requests that can be serving simultaneously in a storage?
  2011-01-07  3:21 Who does determine the number of requests that can be serving simultaneously in a storage? Yuehai Xu
  2011-01-07  5:16 ` Yuehai Xu
@ 2011-01-07  8:21 ` Jens Axboe
  2011-01-07 13:00   ` Yuehai Xu
  1 sibling, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2011-01-07  8:21 UTC (permalink / raw)
  To: Yuehai Xu; +Cc: linux-kernel, cmm, rwheeler, vgoyal, czoccolo, yhxu

On 2011-01-07 04:21, Yuehai Xu wrote:
> Hi all,
> 
> We know that couples of requests can be serving simultaneously in a
> storage because of NCQ. My question is that who does determine the
> exact number of the servicing requests in HDD or SSD? Since the
> capability for different storages(hdd/ssd) to server multiple requests
> is different, how the OS know the exact number of requests that can be
> served simultaneously?
> 
> I fail to figure out the answer. I know the dispatch routine in I/O
> schedulers is elevator_dispatch_fn, which are invoked at two places.
> One is in __elv_next_request(), the other is elv_drain_elevator(). I
> fail to figure out the exact condition to trigger the routine of
> elv_drain_elevator(), from the source code, I know that it should
> dispatch all the requests in pending queue to "request_queue", from
> which the request is selected to dispatch into device driver.
> 
> For __elv_next_request(), it is actually invoked by
> blk_peek_reqeust(), which is invoked by blk_fetch_request(). From
> their comments, I know that only a request should be fetched from
> "request_queue" and this request should be dispatched into
> corresponding device driver. However, I notice that blk_fetch_request
> is invoked at a number of places, it fetches the requests endlessly
> with different stop condition. Which condition is the exact one that
> control the number of requests which can be served at the same time?
> The OS would of course not dispatch requests more than that the
> storage can serve, for example, for SSD, the number of multi requests
> serving simultaneously might be 32, while for HDD, the number is 4.
> But how the OS handle this?

The driver has to take care of this. Since requests are pulled by the
driver, it knows when to stop asking for more work.

BTW, your depth of 4 for the HDD seems a bit odd. Typically all SATA
drives share the same queue depth, limited by what NCQ provides (32).

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Who does determine the number of requests that can be serving simultaneously in a storage?
  2011-01-07  8:21 ` Jens Axboe
@ 2011-01-07 13:00   ` Yuehai Xu
  2011-01-07 13:10     ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Yuehai Xu @ 2011-01-07 13:00 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, cmm, rwheeler, vgoyal, czoccolo, yhxu

I add a tracepoint so that I can get nr_sorted and in_flight[0/1] of
request_queue when request is completed, I consider nr_sorted as the
number of pending requests and in_flight[0/1] represent the number
serving in the storage. Does these two parameters stand for what I
mean?

The test benchmark I use is postmark which simulates the email server
system, over 90% requests are small random write. The storage is Intel
M SSD. Generally, I think the number of in_flight[0/1] should be much
greater than 1, but the result shows that this value is almost 1 no
matter what I/O scheduler(CFQ/DEADLINE/NOOP) or
filesystem(EXT4/EXT3/BTRFS) it is. Is it normal?

B.T.W, I only run a process for postmark.

> The driver has to take care of this. Since requests are pulled by the
> driver, it knows when to stop asking for more work.
>
> BTW, your depth of 4 for the HDD seems a bit odd. Typically all SATA
> drives share the same queue depth, limited by what NCQ provides (32).
>

This number is given arbitrary, just for example.

Thanks,
Yuehai

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Who does determine the number of requests that can be serving simultaneously in a storage?
  2011-01-07 13:00   ` Yuehai Xu
@ 2011-01-07 13:10     ` Jens Axboe
  2011-01-07 13:23       ` Yuehai Xu
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2011-01-07 13:10 UTC (permalink / raw)
  To: Yuehai Xu; +Cc: linux-kernel, cmm, rwheeler, vgoyal, czoccolo, yhxu

Please don't top-post, thanks.

On 2011-01-07 14:00, Yuehai Xu wrote:
> I add a tracepoint so that I can get nr_sorted and in_flight[0/1] of
> request_queue when request is completed, I consider nr_sorted as the
> number of pending requests and in_flight[0/1] represent the number
> serving in the storage. Does these two parameters stand for what I
> mean?

nr_sorted is the number of requests that reside in the IO scheduler.
That means requests that are not on the dispatch list yet. in_flight is
the number that the driver is currently handling. So I think your
understanding is correct.

If you look at where you added your trace point, there are already a
trace point right there. I would recommend that you use blktrace, and
then use btt to parse it. That will give you all sorts of queueing
information.

> The test benchmark I use is postmark which simulates the email server
> system, over 90% requests are small random write. The storage is Intel
> M SSD. Generally, I think the number of in_flight[0/1] should be much
> greater than 1, but the result shows that this value is almost 1 no
> matter what I/O scheduler(CFQ/DEADLINE/NOOP) or
> filesystem(EXT4/EXT3/BTRFS) it is. Is it normal?

Depends, do you have more requests pending in the IO scheduler? I'm
assuming you already verified that NCQ is active and working for your
drive.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Who does determine the number of requests that can be serving simultaneously in a storage?
  2011-01-07 13:10     ` Jens Axboe
@ 2011-01-07 13:23       ` Yuehai Xu
  2011-01-07 15:30         ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Yuehai Xu @ 2011-01-07 13:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, cmm, rwheeler, vgoyal, czoccolo, yhxu

On Fri, Jan 7, 2011 at 8:10 AM, Jens Axboe <axboe@kernel.dk> wrote:
>
> Please don't top-post, thanks.

I am really sorry for that.

>
> On 2011-01-07 14:00, Yuehai Xu wrote:
>> I add a tracepoint so that I can get nr_sorted and in_flight[0/1] of
>> request_queue when request is completed, I consider nr_sorted as the
>> number of pending requests and in_flight[0/1] represent the number
>> serving in the storage. Does these two parameters stand for what I
>> mean?
>
> nr_sorted is the number of requests that reside in the IO scheduler.
> That means requests that are not on the dispatch list yet. in_flight is
> the number that the driver is currently handling. So I think your
> understanding is correct.
>
> If you look at where you added your trace point, there are already a
> trace point right there. I would recommend that you use blktrace, and
> then use btt to parse it. That will give you all sorts of queueing
> information.

Yes, but I notice that the original traces can't get nr_sorted and
in_flight[0/1] directly, so I just add few lines. The result is from
the blktrace, and I use blkparse to analysis it, it should be the same
as what you said about btt(I don't know it, sorry about that).

>
>> The test benchmark I use is postmark which simulates the email server
>> system, over 90% requests are small random write. The storage is Intel
>> M SSD. Generally, I think the number of in_flight[0/1] should be much
>> greater than 1, but the result shows that this value is almost 1 no
>> matter what I/O scheduler(CFQ/DEADLINE/NOOP) or
>> filesystem(EXT4/EXT3/BTRFS) it is. Is it normal?
>
> Depends, do you have more requests pending in the IO scheduler? I'm
> assuming you already verified that NCQ is active and working for your
> drive.
>

Yes, the nr_sorted(num of pending requests) remains around 100. hdparm
shows that the NCQ has been enabled.

Thanks,
Yuehai

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Who does determine the number of requests that can be serving simultaneously in a storage?
  2011-01-07 13:23       ` Yuehai Xu
@ 2011-01-07 15:30         ` Jens Axboe
  2011-01-07 16:45           ` Yuehai Xu
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2011-01-07 15:30 UTC (permalink / raw)
  To: Yuehai Xu; +Cc: linux-kernel, cmm, rwheeler, vgoyal, czoccolo, yhxu

On 2011-01-07 14:23, Yuehai Xu wrote:
> On Fri, Jan 7, 2011 at 8:10 AM, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> Please don't top-post, thanks.
> 
> I am really sorry for that.
> 
>>
>> On 2011-01-07 14:00, Yuehai Xu wrote:
>>> I add a tracepoint so that I can get nr_sorted and in_flight[0/1] of
>>> request_queue when request is completed, I consider nr_sorted as the
>>> number of pending requests and in_flight[0/1] represent the number
>>> serving in the storage. Does these two parameters stand for what I
>>> mean?
>>
>> nr_sorted is the number of requests that reside in the IO scheduler.
>> That means requests that are not on the dispatch list yet. in_flight is
>> the number that the driver is currently handling. So I think your
>> understanding is correct.
>>
>> If you look at where you added your trace point, there are already a
>> trace point right there. I would recommend that you use blktrace, and
>> then use btt to parse it. That will give you all sorts of queueing
>> information.
> 
> Yes, but I notice that the original traces can't get nr_sorted and
> in_flight[0/1] directly, so I just add few lines. The result is from
> the blktrace, and I use blkparse to analysis it, it should be the same
> as what you said about btt(I don't know it, sorry about that).

You don't need those values. btt can just look at dispatch and
completion events to get an exact queue depth number at any point in
time.

>>> The test benchmark I use is postmark which simulates the email server
>>> system, over 90% requests are small random write. The storage is Intel
>>> M SSD. Generally, I think the number of in_flight[0/1] should be much
>>> greater than 1, but the result shows that this value is almost 1 no
>>> matter what I/O scheduler(CFQ/DEADLINE/NOOP) or
>>> filesystem(EXT4/EXT3/BTRFS) it is. Is it normal?
>>
>> Depends, do you have more requests pending in the IO scheduler? I'm
>> assuming you already verified that NCQ is active and working for your
>> drive.
>>
> 
> Yes, the nr_sorted(num of pending requests) remains around 100. hdparm
> shows that the NCQ has been enabled.

I would double check that NCQ really is active, not just supported. For
instance, the controller needs to support it too. If you look at dmesg
from when it detects your drive, it should print the queue depth used.
Or you can check queue_depth in the sysfs scsi_device directory. It
should be 31 (32 in total, but one has to be reserved for error
handling) for NCQ enabled, or 1 if if isn't.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Who does determine the number of requests that can be serving simultaneously in a storage?
  2011-01-07 15:30         ` Jens Axboe
@ 2011-01-07 16:45           ` Yuehai Xu
  0 siblings, 0 replies; 8+ messages in thread
From: Yuehai Xu @ 2011-01-07 16:45 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, cmm, rwheeler, vgoyal, czoccolo, yhxu

> You don't need those values. btt can just look at dispatch and
> completion events to get an exact queue depth number at any point in
> time.

Cool tool, I need to learn it further.

> I would double check that NCQ really is active, not just supported. For
> instance, the controller needs to support it too. If you look at dmesg
> from when it detects your drive, it should print the queue depth used.
> Or you can check queue_depth in the sysfs scsi_device directory. It
> should be 31 (32 in total, but one has to be reserved for error
> handling) for NCQ enabled, or 1 if if isn't.
>

You are right, info from dmesg:
[    1.476660] ata4.00: 156301488 sectors, multi 16: LBA48 NCQ (depth 0/32)

and queue_depth is 1.

It proves that the NCQ of SSD is not actually activated. Error
message(bash: /sys/block/sdb/device/queue_depth: Permission denied) is
got when I "echo 31 > /sys/block/sdb/device/queue_depth" even though I
am already with the privilege of root.

Anyway, one thing is certain, it is because the NCQ of SSD is not
activated, that the number of requests in flight is 1.

I really appreciate your help, thanks very much!

Yuehai

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-01-07 16:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-07  3:21 Who does determine the number of requests that can be serving simultaneously in a storage? Yuehai Xu
2011-01-07  5:16 ` Yuehai Xu
2011-01-07  8:21 ` Jens Axboe
2011-01-07 13:00   ` Yuehai Xu
2011-01-07 13:10     ` Jens Axboe
2011-01-07 13:23       ` Yuehai Xu
2011-01-07 15:30         ` Jens Axboe
2011-01-07 16:45           ` Yuehai Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox