Re: Slow file transfer speeds with CFQ IO scheduler in some cases

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Wu Fengguang <wfg@linux.intel.com>
To: Vladislav Bolkhovitin <vst@vlnb.net>
Cc: Jens Axboe <jens.axboe@oracle.com>,
	Jeff Moyer <jmoyer@redhat.com>,
	"Vitaly V. Bursov" <vitalyb@telenet.dn.ua>,
	linux-kernel@vger.kernel.org
Subject: Re: Slow file transfer speeds with CFQ IO scheduler in some cases
Date: Tue, 25 Nov 2008 19:49:08 +0800	[thread overview]
Message-ID: <20081125114908.GA16545@localhost> (raw)
In-Reply-To: <492BE47B.3010802@vlnb.net>

On Tue, Nov 25, 2008 at 02:41:47PM +0300, Vladislav Bolkhovitin wrote:
> Wu Fengguang wrote:
>> On Tue, Nov 25, 2008 at 01:59:53PM +0300, Vladislav Bolkhovitin wrote:
>>> Wu Fengguang wrote:
>>>> Hi all,
>>>>
>>>> //Sorry for being late. 
>>>>
>>>> On Wed, Nov 12, 2008 at 08:02:28PM +0100, Jens Axboe wrote:
>>>> [...]
>>>>> I already talked about this with Jeff on irc, but I guess should post it
>>>>> here as well.
>>>>>
>>>>> nfsd aside (which does seem to have some different behaviour skewing the
>>>>> results), the original patch came about because dump(8) has a really
>>>>> stupid design that offloads IO to a number of processes. This basically
>>>>> makes fairly sequential IO more random with CFQ, since each process gets
>>>>> its own io context. My feeling is that we should fix dump instead of
>>>>> introducing a fair bit of complexity (and slowdown) in CFQ. I'm not
>>>>> aware of any other good programs out there that would do something
>>>>> similar, so I don't think there's a lot of merrit to spending cycles on
>>>>> detecting cooperating processes.
>>>>>
>>>>> Jeff will take a look at fixing dump instead, and I may have promised
>>>>> him that santa will bring him something nice this year if he does (since
>>>>> I'm sure it'll be painful on the eyes).
>>>> This could also be fixed at the VFS readahead level.
>>>>
>>>> In fact I've seen many kinds of interleaved accesses:
>>>> - concurrently reading 40 files that are in fact hard links of one single file
>>>> - a backup tool that splits a big file into 8k chunks, and serve the
>>>>   {1, 3, 5, 7, ...} chunks in one process and the {0, 2, 4, 6, ...}
>>>>   chunks in another one
>>>> - a pool of NFSDs randomly serving some originally sequential read  
>>>> requests - now dump(8) seems to have some similar problem.
>>>>
>>>> In summary there have been all kinds of efforts on trying to
>>>> parallelize I/O tasks, but unfortunately they can easily screw up the
>>>> sequential pattern. It may not be easily fixable for many of them.
>>>>
>>>> It is however possible to detect most of these patterns at the
>>>> readahead layer and restore sequential I/Os, before they propagate
>>>> into the block layer and hurt performance.
>>> I believe this would be the most effective way to go, especially in 
>>> case  if data delivery path to the original client has its own 
>>> latency  depended from the amount of transferred data as it is in the 
>>> case of  remote NFS mount, which does synchronous sequential reads. 
>>> In this case  it is essential for performance to make both links 
>>> (local to the storage  and network to the client) be always busy and 
>>> transfer data  simultaneously. Since the reads are synchronous, the 
>>> only way to achieve  that is perform read ahead on the server 
>>> sufficient to cover the network  link latency. Otherwise you would 
>>> end up with only half of possible  throughput.
>>>
>>> However, from one side, server has to have a pool of 
>>> threads/processes  to perform well, but, from other side, current 
>>> read ahead code doesn't  detect too well that those threads/processes 
>>> are doing joint sequential  read, so the read ahead window gets 
>>> smaller, hence the overall read  performance gets considerably 
>>> smaller too.
>>>
>>>> Vitaly, if that's what you need, I can try to prepare a patch for testing out.
>>> I can test it with SCST SCSI target sybsystem (http://scst.sf.net). 
>>> SCST  needs such feature very much, otherwise it can't get full 
>>> backstorage  read speed. The maximum I can see is about ~80MB/s from 
>>> ~130MB/s 15K RPM  disk over 1Gbps iSCSI link (maximum possible is 
>>> ~110MB/s).
>>
>> Thank you very much!
>>
>> BTW, do you implicate that the SCSI system (or its applications) has
>> similar behaviors that the current readahead code cannot handle well?
>
> No. SCSI target subsystem is not the same as SCSI initiator subsystem,  
> which usually called simply SCSI (sub)system. SCSI target is a SCSI  
> server. It has the same amount of common with SCSI initiator as there  
> is, e.g., between Apache (HTTP server) and Firefox (HTTP client).

Got it. So the SCSI server will split&spread sequential IO of one
single file to cooperative threads? I'm trying to understand why the
proposed page cache context based readahead would help a SCSI server.

Thanks,
Fengguang

next prev parent reply	other threads:[~2008-11-25 11:49 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-09 18:04 Slow file transfer speeds with CFQ IO scheduler in some cases Vitaly V. Bursov
2008-11-09 18:30 ` Alexey Dobriyan
2008-11-09 18:32   ` Vitaly V. Bursov
2008-11-10 10:44 ` Jens Axboe
2008-11-10 13:51   ` Jeff Moyer
2008-11-10 13:56     ` Jens Axboe
2008-11-10 17:16       ` Vitaly V. Bursov
2008-11-10 17:35         ` Jens Axboe
2008-11-10 18:27           ` Vitaly V. Bursov
2008-11-10 18:29             ` Jens Axboe
2008-11-10 18:39               ` Jeff Moyer
2008-11-10 18:42               ` Jens Axboe
2008-11-10 21:51             ` Jeff Moyer
2008-11-11  9:34               ` Jens Axboe
2008-11-11  9:35                 ` Jens Axboe
2008-11-11 11:52                   ` Jens Axboe
2008-11-11 16:48                     ` Jeff Moyer
2008-11-11 18:08                       ` Jens Axboe
2008-11-11 16:53                     ` Vitaly V. Bursov
2008-11-11 18:06                       ` Jens Axboe
2008-11-11 19:36                         ` Jeff Moyer
2008-11-11 21:41                           ` Jeff Layton
2008-11-11 21:59                             ` Jeff Layton
2008-11-12 12:20                               ` Jens Axboe
2008-11-12 12:45                                 ` Jeff Layton
2008-11-12 12:54                                   ` Christoph Hellwig
2008-11-11 19:42                         ` Vitaly V. Bursov
2008-11-12 18:32       ` Jeff Moyer
2008-11-12 19:02         ` Jens Axboe
2008-11-13  8:51           ` Wu Fengguang
2008-11-13  8:54             ` Jens Axboe
2008-11-14  1:36               ` Wu Fengguang
2008-11-25 11:02                 ` Vladislav Bolkhovitin
2008-11-25 11:25                   ` Wu Fengguang
2008-11-25 15:21                   ` Jeff Moyer
2008-11-25 16:17                     ` Vladislav Bolkhovitin
2008-11-13 18:46             ` Vitaly V. Bursov
2008-11-25 10:59             ` Vladislav Bolkhovitin
2008-11-25 11:30               ` Wu Fengguang
2008-11-25 11:41                 ` Vladislav Bolkhovitin
2008-11-25 11:49                   ` Wu Fengguang [this message]
2008-11-25 12:03                     ` Vladislav Bolkhovitin
2008-11-25 12:09                       ` Vladislav Bolkhovitin
2008-11-25 12:15                         ` Wu Fengguang
2008-11-27 17:46                           ` Vladislav Bolkhovitin
2008-11-28  0:48                             ` Wu Fengguang
2009-02-12 18:35                               ` Vladislav Bolkhovitin
2009-02-13  1:57                                 ` Wu Fengguang
2009-02-13 20:08                                   ` Vladislav Bolkhovitin
2009-02-16  2:34                                     ` Wu Fengguang
2009-02-17 19:03                                       ` Vladislav Bolkhovitin
2009-02-18 18:14                                         ` Vladislav Bolkhovitin
2009-02-19  1:35                                         ` Wu Fengguang
2009-02-17 19:01                                   ` Vladislav Bolkhovitin
2009-02-19  2:05                                     ` Wu Fengguang
2009-03-19 17:44                                       ` Vladislav Bolkhovitin
2009-03-20  8:53                                         ` Vladislav Bolkhovitin
2009-03-23  1:42                                         ` Wu Fengguang
2009-04-21 18:18                                           ` Vladislav Bolkhovitin
2009-04-24  8:43                                             ` Wu Fengguang
2009-05-12 18:13                                               ` Vladislav Bolkhovitin
2009-02-17 19:01                                 ` Vladislav Bolkhovitin
2009-02-19  1:38                                   ` Wu Fengguang
2008-11-24 15:33           ` Jeff Moyer
2008-11-24 18:13             ` Jens Axboe
2008-11-24 18:50               ` Jeff Moyer
2008-11-24 18:51                 ` Jens Axboe
2008-11-13  6:54         ` Vitaly V. Bursov
2008-11-13 14:32           ` Jeff Moyer
2008-11-13 18:33             ` Vitaly V. Bursov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081125114908.GA16545@localhost \
    --to=wfg@linux.intel.com \
    --cc=jens.axboe@oracle.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vitalyb@telenet.dn.ua \
    --cc=vst@vlnb.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox