Re: Slow file transfer speeds with CFQ IO scheduler in some cases

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org>
To: Wu Fengguang <wfg@linux.intel.com>
Cc: Jens Axboe <jens.axboe@oracle.com>,
	Jeff Moyer <jmoyer@redhat.com>,
	"Vitaly V. Bursov"
	<vitalyb-CNXmb7IdZIXO1JJkmS+EZg@public.gmane.org>,
	linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org
Subject: Re: Slow file transfer speeds with CFQ IO scheduler in some cases
Date: Tue, 17 Feb 2009 22:01:40 +0300	[thread overview]
Message-ID: <499B0994.8040000@vlnb.net> (raw)
In-Reply-To: <20090213015721.GA5565@localhost>

[-- Attachment #1: Type: text/plain, Size: 2983 bytes --]

Wu Fengguang, on 02/13/2009 04:57 AM wrote:
> On Thu, Feb 12, 2009 at 09:35:18PM +0300, Vladislav Bolkhovitin wrote:
>> Sorry for such a huge delay. There were many other activities I had to  
>> do before + I had to be sure I didn't miss anything.
>>
>> We didn't use NFS, we used SCST (http://scst.sourceforge.net) with  
>> iSCSI-SCST target driver. It has similar to NFS architecture, where N  
>> threads (N=5 in this case) handle IO from remote initiators (clients)  
>> coming from wire using iSCSI protocol. In addition, SCST has patch  
>> called export_alloc_io_context (see  
>> http://lkml.org/lkml/2008/12/10/282), which allows for the IO threads  
>> queue IO using single IO context, so we can see if context RA can  
>> replace grouping IO threads in single IO context.
>>
>> Unfortunately, the results are negative. We find neither any advantages  
>> of context RA over current RA implementation, nor possibility for  
>> context RA to replace grouping IO threads in single IO context.
>>
>> Setup on the target (server) was the following. 2 SATA drives grouped in  
>> md RAID-0 with average local read throughput ~120MB/s ("dd if=/dev/zero  
>> of=/dev/md0 bs=1M count=20000" outputs "20971520000 bytes (21 GB)  
>> copied, 177,742 s, 118 MB/s"). The md device was partitioned on 3  
>> partitions. The first partition was 10% of space in the beginning of the  
>> device, the last partition was 10% of space in the end of the device,  
>> the middle one was the rest in the middle of the space them. Then the  
>> first and the last partitions were exported to the initiator (client).  
>> They were /dev/sdb and /dev/sdc on it correspondingly.
> 
> Vladislav, Thank you for the benchmarks! I'm very interested in
> optimizing your workload and figuring out what happens underneath.
> 
> Are the client and server two standalone boxes connected by GBE?
> 
> When you set readahead sizes in the benchmarks, you are setting them
> in the server side? I.e. "linux-4dtq" is the SCST server? What's the
> client side readahead size?
> 
> It would help a lot to debug readahead if you can provide the
> server side readahead stats and trace log for the worst case.
> This will automatically answer the above questions as well as disclose
> the micro-behavior of readahead:
> 
>         mount -t debugfs none /sys/kernel/debug
> 
>         echo > /sys/kernel/debug/readahead/stats # reset counters
>         # do benchmark
>         cat /sys/kernel/debug/readahead/stats
> 
>         echo 1 > /sys/kernel/debug/readahead/trace_enable
>         # do micro-benchmark, i.e. run the same benchmark for a short time
>         echo 0 > /sys/kernel/debug/readahead/trace_enable
>         dmesg
> 
> The above readahead trace should help find out how the client side
> sequential reads convert into server side random reads, and how we can
> prevent that.

See attached. Could you comment the logs, please, so I will also be able 
to read them in the future?

Thank you,
Vlad


[-- Attachment #2: RA-debug.zip --]
[-- Type: application/zip, Size: 18593 bytes --]

WARNING: multiple messages have this Message-ID (diff)

From: Vladislav Bolkhovitin <vst@vlnb.net>
To: Wu Fengguang <wfg@linux.intel.com>
Cc: Jens Axboe <jens.axboe@oracle.com>,
	Jeff Moyer <jmoyer@redhat.com>,
	"Vitaly V. Bursov" <vitalyb@telenet.dn.ua>,
	linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org
Subject: Re: Slow file transfer speeds with CFQ IO scheduler in some cases
Date: Tue, 17 Feb 2009 22:01:40 +0300	[thread overview]
Message-ID: <499B0994.8040000@vlnb.net> (raw)
In-Reply-To: <20090213015721.GA5565@localhost>

[-- Attachment #1: Type: text/plain, Size: 2983 bytes --]

Wu Fengguang, on 02/13/2009 04:57 AM wrote:
> On Thu, Feb 12, 2009 at 09:35:18PM +0300, Vladislav Bolkhovitin wrote:
>> Sorry for such a huge delay. There were many other activities I had to  
>> do before + I had to be sure I didn't miss anything.
>>
>> We didn't use NFS, we used SCST (http://scst.sourceforge.net) with  
>> iSCSI-SCST target driver. It has similar to NFS architecture, where N  
>> threads (N=5 in this case) handle IO from remote initiators (clients)  
>> coming from wire using iSCSI protocol. In addition, SCST has patch  
>> called export_alloc_io_context (see  
>> http://lkml.org/lkml/2008/12/10/282), which allows for the IO threads  
>> queue IO using single IO context, so we can see if context RA can  
>> replace grouping IO threads in single IO context.
>>
>> Unfortunately, the results are negative. We find neither any advantages  
>> of context RA over current RA implementation, nor possibility for  
>> context RA to replace grouping IO threads in single IO context.
>>
>> Setup on the target (server) was the following. 2 SATA drives grouped in  
>> md RAID-0 with average local read throughput ~120MB/s ("dd if=/dev/zero  
>> of=/dev/md0 bs=1M count=20000" outputs "20971520000 bytes (21 GB)  
>> copied, 177,742 s, 118 MB/s"). The md device was partitioned on 3  
>> partitions. The first partition was 10% of space in the beginning of the  
>> device, the last partition was 10% of space in the end of the device,  
>> the middle one was the rest in the middle of the space them. Then the  
>> first and the last partitions were exported to the initiator (client).  
>> They were /dev/sdb and /dev/sdc on it correspondingly.
> 
> Vladislav, Thank you for the benchmarks! I'm very interested in
> optimizing your workload and figuring out what happens underneath.
> 
> Are the client and server two standalone boxes connected by GBE?
> 
> When you set readahead sizes in the benchmarks, you are setting them
> in the server side? I.e. "linux-4dtq" is the SCST server? What's the
> client side readahead size?
> 
> It would help a lot to debug readahead if you can provide the
> server side readahead stats and trace log for the worst case.
> This will automatically answer the above questions as well as disclose
> the micro-behavior of readahead:
> 
>         mount -t debugfs none /sys/kernel/debug
> 
>         echo > /sys/kernel/debug/readahead/stats # reset counters
>         # do benchmark
>         cat /sys/kernel/debug/readahead/stats
> 
>         echo 1 > /sys/kernel/debug/readahead/trace_enable
>         # do micro-benchmark, i.e. run the same benchmark for a short time
>         echo 0 > /sys/kernel/debug/readahead/trace_enable
>         dmesg
> 
> The above readahead trace should help find out how the client side
> sequential reads convert into server side random reads, and how we can
> prevent that.

See attached. Could you comment the logs, please, so I will also be able 
to read them in the future?

Thank you,
Vlad


[-- Attachment #2: RA-debug.zip --]
[-- Type: application/zip, Size: 18593 bytes --]

next prev parent reply	other threads:[~2009-02-17 19:01 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-09 18:04 Slow file transfer speeds with CFQ IO scheduler in some cases Vitaly V. Bursov
2008-11-09 18:30 ` Alexey Dobriyan
2008-11-09 18:32   ` Vitaly V. Bursov
2008-11-10 10:44 ` Jens Axboe
2008-11-10 13:51   ` Jeff Moyer
2008-11-10 13:56     ` Jens Axboe
2008-11-10 17:16       ` Vitaly V. Bursov
2008-11-10 17:35         ` Jens Axboe
2008-11-10 18:27           ` Vitaly V. Bursov
2008-11-10 18:29             ` Jens Axboe
2008-11-10 18:39               ` Jeff Moyer
2008-11-10 18:42               ` Jens Axboe
2008-11-10 21:51             ` Jeff Moyer
2008-11-11  9:34               ` Jens Axboe
2008-11-11  9:35                 ` Jens Axboe
2008-11-11 11:52                   ` Jens Axboe
2008-11-11 16:48                     ` Jeff Moyer
2008-11-11 18:08                       ` Jens Axboe
2008-11-11 16:53                     ` Vitaly V. Bursov
2008-11-11 18:06                       ` Jens Axboe
2008-11-11 19:36                         ` Jeff Moyer
2008-11-11 21:41                           ` Jeff Layton
2008-11-11 21:59                             ` Jeff Layton
2008-11-12 12:20                               ` Jens Axboe
2008-11-12 12:45                                 ` Jeff Layton
2008-11-12 12:54                                   ` Christoph Hellwig
2008-11-11 19:42                         ` Vitaly V. Bursov
2008-11-12 18:32       ` Jeff Moyer
2008-11-12 19:02         ` Jens Axboe
2008-11-13  8:51           ` Wu Fengguang
2008-11-13  8:54             ` Jens Axboe
2008-11-14  1:36               ` Wu Fengguang
2008-11-25 11:02                 ` Vladislav Bolkhovitin
2008-11-25 11:25                   ` Wu Fengguang
2008-11-25 15:21                   ` Jeff Moyer
2008-11-25 16:17                     ` Vladislav Bolkhovitin
2008-11-13 18:46             ` Vitaly V. Bursov
2008-11-25 10:59             ` Vladislav Bolkhovitin
2008-11-25 11:30               ` Wu Fengguang
2008-11-25 11:41                 ` Vladislav Bolkhovitin
2008-11-25 11:49                   ` Wu Fengguang
2008-11-25 12:03                     ` Vladislav Bolkhovitin
2008-11-25 12:09                       ` Vladislav Bolkhovitin
2008-11-25 12:15                         ` Wu Fengguang
2008-11-27 17:46                           ` Vladislav Bolkhovitin
     [not found]                             ` <492EDCFB.7080302-d+Crzxg7Rs0@public.gmane.org>
2008-11-28  0:48                               ` Wu Fengguang
2008-11-28  0:48                                 ` Wu Fengguang
2009-02-12 18:35                                 ` Vladislav Bolkhovitin
2009-02-13  1:57                                   ` Wu Fengguang
2009-02-13 20:08                                     ` Vladislav Bolkhovitin
2009-02-13 20:08                                       ` Vladislav Bolkhovitin
     [not found]                                       ` <4995D339.5050502-d+Crzxg7Rs0@public.gmane.org>
2009-02-16  2:34                                         ` Wu Fengguang
2009-02-16  2:34                                           ` Wu Fengguang
2009-02-17 19:03                                           ` Vladislav Bolkhovitin
2009-02-17 19:03                                             ` Vladislav Bolkhovitin
2009-02-18 18:14                                             ` Vladislav Bolkhovitin
2009-02-19  1:35                                             ` Wu Fengguang
2009-02-17 19:01                                     ` Vladislav Bolkhovitin [this message]
2009-02-17 19:01                                       ` Vladislav Bolkhovitin
2009-02-19  2:05                                       ` Wu Fengguang
2009-03-19 17:44                                         ` Vladislav Bolkhovitin
2009-03-20  8:53                                           ` Vladislav Bolkhovitin
2009-03-23  1:42                                           ` Wu Fengguang
2009-04-21 18:18                                             ` Vladislav Bolkhovitin
2009-04-24  8:43                                               ` Wu Fengguang
2009-05-12 18:13                                                 ` Vladislav Bolkhovitin
     [not found]                                   ` <49946BE6.1040005-d+Crzxg7Rs0@public.gmane.org>
2009-02-17 19:01                                     ` Vladislav Bolkhovitin
2009-02-17 19:01                                       ` Vladislav Bolkhovitin
     [not found]                                       ` <499B0979.8050006-d+Crzxg7Rs0@public.gmane.org>
2009-02-19  1:38                                         ` Wu Fengguang
2009-02-19  1:38                                           ` Wu Fengguang
2008-11-24 15:33           ` Jeff Moyer
2008-11-24 18:13             ` Jens Axboe
2008-11-24 18:50               ` Jeff Moyer
2008-11-24 18:51                 ` Jens Axboe
2008-11-13  6:54         ` Vitaly V. Bursov
2008-11-13 14:32           ` Jeff Moyer
2008-11-13 18:33             ` Vitaly V. Bursov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=499B0994.8040000@vlnb.net \
    --to=vst-d+crzxg7rs0@public.gmane.org \
    --cc=jens.axboe@oracle.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=vitalyb-CNXmb7IdZIXO1JJkmS+EZg@public.gmane.org \
    --cc=wfg@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.