public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Vladislav Bolkhovitin <vst@vlnb.net>
To: Wu Fengguang <wfg@linux.intel.com>
Cc: Jens Axboe <jens.axboe@oracle.com>,
	Jeff Moyer <jmoyer@redhat.com>,
	"Vitaly V. Bursov" <vitalyb@telenet.dn.ua>,
	linux-kernel@vger.kernel.org
Subject: Re: Slow file transfer speeds with CFQ IO scheduler in some cases
Date: Tue, 25 Nov 2008 15:03:06 +0300	[thread overview]
Message-ID: <492BE97A.3050606@vlnb.net> (raw)
In-Reply-To: <20081125114908.GA16545@localhost>

Wu Fengguang wrote:
> On Tue, Nov 25, 2008 at 02:41:47PM +0300, Vladislav Bolkhovitin wrote:
>> Wu Fengguang wrote:
>>> On Tue, Nov 25, 2008 at 01:59:53PM +0300, Vladislav Bolkhovitin wrote:
>>>> Wu Fengguang wrote:
>>>>> Hi all,
>>>>>
>>>>> //Sorry for being late. 
>>>>>
>>>>> On Wed, Nov 12, 2008 at 08:02:28PM +0100, Jens Axboe wrote:
>>>>> [...]
>>>>>> I already talked about this with Jeff on irc, but I guess should post it
>>>>>> here as well.
>>>>>>
>>>>>> nfsd aside (which does seem to have some different behaviour skewing the
>>>>>> results), the original patch came about because dump(8) has a really
>>>>>> stupid design that offloads IO to a number of processes. This basically
>>>>>> makes fairly sequential IO more random with CFQ, since each process gets
>>>>>> its own io context. My feeling is that we should fix dump instead of
>>>>>> introducing a fair bit of complexity (and slowdown) in CFQ. I'm not
>>>>>> aware of any other good programs out there that would do something
>>>>>> similar, so I don't think there's a lot of merrit to spending cycles on
>>>>>> detecting cooperating processes.
>>>>>>
>>>>>> Jeff will take a look at fixing dump instead, and I may have promised
>>>>>> him that santa will bring him something nice this year if he does (since
>>>>>> I'm sure it'll be painful on the eyes).
>>>>> This could also be fixed at the VFS readahead level.
>>>>>
>>>>> In fact I've seen many kinds of interleaved accesses:
>>>>> - concurrently reading 40 files that are in fact hard links of one single file
>>>>> - a backup tool that splits a big file into 8k chunks, and serve the
>>>>>   {1, 3, 5, 7, ...} chunks in one process and the {0, 2, 4, 6, ...}
>>>>>   chunks in another one
>>>>> - a pool of NFSDs randomly serving some originally sequential read  
>>>>> requests - now dump(8) seems to have some similar problem.
>>>>>
>>>>> In summary there have been all kinds of efforts on trying to
>>>>> parallelize I/O tasks, but unfortunately they can easily screw up the
>>>>> sequential pattern. It may not be easily fixable for many of them.
>>>>>
>>>>> It is however possible to detect most of these patterns at the
>>>>> readahead layer and restore sequential I/Os, before they propagate
>>>>> into the block layer and hurt performance.
>>>> I believe this would be the most effective way to go, especially in 
>>>> case  if data delivery path to the original client has its own 
>>>> latency  depended from the amount of transferred data as it is in the 
>>>> case of  remote NFS mount, which does synchronous sequential reads. 
>>>> In this case  it is essential for performance to make both links 
>>>> (local to the storage  and network to the client) be always busy and 
>>>> transfer data  simultaneously. Since the reads are synchronous, the 
>>>> only way to achieve  that is perform read ahead on the server 
>>>> sufficient to cover the network  link latency. Otherwise you would 
>>>> end up with only half of possible  throughput.
>>>>
>>>> However, from one side, server has to have a pool of 
>>>> threads/processes  to perform well, but, from other side, current 
>>>> read ahead code doesn't  detect too well that those threads/processes 
>>>> are doing joint sequential  read, so the read ahead window gets 
>>>> smaller, hence the overall read  performance gets considerably 
>>>> smaller too.
>>>>
>>>>> Vitaly, if that's what you need, I can try to prepare a patch for testing out.
>>>> I can test it with SCST SCSI target sybsystem (http://scst.sf.net). 
>>>> SCST  needs such feature very much, otherwise it can't get full 
>>>> backstorage  read speed. The maximum I can see is about ~80MB/s from 
>>>> ~130MB/s 15K RPM  disk over 1Gbps iSCSI link (maximum possible is 
>>>> ~110MB/s).
>>> Thank you very much!
>>>
>>> BTW, do you implicate that the SCSI system (or its applications) has
>>> similar behaviors that the current readahead code cannot handle well?
>> No. SCSI target subsystem is not the same as SCSI initiator subsystem,  
>> which usually called simply SCSI (sub)system. SCSI target is a SCSI  
>> server. It has the same amount of common with SCSI initiator as there  
>> is, e.g., between Apache (HTTP server) and Firefox (HTTP client).
> 
> Got it. So the SCSI server will split&spread sequential IO of one
> single file to cooperative threads?

Yes. It has to do so, because Linux doesn't have async. cached IO and a 
client can queue several tens of commands at time. Then, on the 
sequential IO with 1 command at time, CPU scheduler comes to play and 
spreads those commands over those threads, so read ahead gets too small 
to cover the external link latency and fill both links with data, so 
that uncovered latency kills throughput.

> I'm trying to understand why the
> proposed page cache context based readahead would help a SCSI server.
> 
> Thanks,
> Fengguang
> 


  reply	other threads:[~2008-11-25 12:03 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-09 18:04 Slow file transfer speeds with CFQ IO scheduler in some cases Vitaly V. Bursov
2008-11-09 18:30 ` Alexey Dobriyan
2008-11-09 18:32   ` Vitaly V. Bursov
2008-11-10 10:44 ` Jens Axboe
2008-11-10 13:51   ` Jeff Moyer
2008-11-10 13:56     ` Jens Axboe
2008-11-10 17:16       ` Vitaly V. Bursov
2008-11-10 17:35         ` Jens Axboe
2008-11-10 18:27           ` Vitaly V. Bursov
2008-11-10 18:29             ` Jens Axboe
2008-11-10 18:39               ` Jeff Moyer
2008-11-10 18:42               ` Jens Axboe
2008-11-10 21:51             ` Jeff Moyer
2008-11-11  9:34               ` Jens Axboe
2008-11-11  9:35                 ` Jens Axboe
2008-11-11 11:52                   ` Jens Axboe
2008-11-11 16:48                     ` Jeff Moyer
2008-11-11 18:08                       ` Jens Axboe
2008-11-11 16:53                     ` Vitaly V. Bursov
2008-11-11 18:06                       ` Jens Axboe
2008-11-11 19:36                         ` Jeff Moyer
2008-11-11 21:41                           ` Jeff Layton
2008-11-11 21:59                             ` Jeff Layton
2008-11-12 12:20                               ` Jens Axboe
2008-11-12 12:45                                 ` Jeff Layton
2008-11-12 12:54                                   ` Christoph Hellwig
2008-11-11 19:42                         ` Vitaly V. Bursov
2008-11-12 18:32       ` Jeff Moyer
2008-11-12 19:02         ` Jens Axboe
2008-11-13  8:51           ` Wu Fengguang
2008-11-13  8:54             ` Jens Axboe
2008-11-14  1:36               ` Wu Fengguang
2008-11-25 11:02                 ` Vladislav Bolkhovitin
2008-11-25 11:25                   ` Wu Fengguang
2008-11-25 15:21                   ` Jeff Moyer
2008-11-25 16:17                     ` Vladislav Bolkhovitin
2008-11-13 18:46             ` Vitaly V. Bursov
2008-11-25 10:59             ` Vladislav Bolkhovitin
2008-11-25 11:30               ` Wu Fengguang
2008-11-25 11:41                 ` Vladislav Bolkhovitin
2008-11-25 11:49                   ` Wu Fengguang
2008-11-25 12:03                     ` Vladislav Bolkhovitin [this message]
2008-11-25 12:09                       ` Vladislav Bolkhovitin
2008-11-25 12:15                         ` Wu Fengguang
2008-11-27 17:46                           ` Vladislav Bolkhovitin
2008-11-28  0:48                             ` Wu Fengguang
2009-02-12 18:35                               ` Vladislav Bolkhovitin
2009-02-13  1:57                                 ` Wu Fengguang
2009-02-13 20:08                                   ` Vladislav Bolkhovitin
2009-02-16  2:34                                     ` Wu Fengguang
2009-02-17 19:03                                       ` Vladislav Bolkhovitin
2009-02-18 18:14                                         ` Vladislav Bolkhovitin
2009-02-19  1:35                                         ` Wu Fengguang
2009-02-17 19:01                                   ` Vladislav Bolkhovitin
2009-02-19  2:05                                     ` Wu Fengguang
2009-03-19 17:44                                       ` Vladislav Bolkhovitin
2009-03-20  8:53                                         ` Vladislav Bolkhovitin
2009-03-23  1:42                                         ` Wu Fengguang
2009-04-21 18:18                                           ` Vladislav Bolkhovitin
2009-04-24  8:43                                             ` Wu Fengguang
2009-05-12 18:13                                               ` Vladislav Bolkhovitin
2009-02-17 19:01                                 ` Vladislav Bolkhovitin
2009-02-19  1:38                                   ` Wu Fengguang
2008-11-24 15:33           ` Jeff Moyer
2008-11-24 18:13             ` Jens Axboe
2008-11-24 18:50               ` Jeff Moyer
2008-11-24 18:51                 ` Jens Axboe
2008-11-13  6:54         ` Vitaly V. Bursov
2008-11-13 14:32           ` Jeff Moyer
2008-11-13 18:33             ` Vitaly V. Bursov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=492BE97A.3050606@vlnb.net \
    --to=vst@vlnb.net \
    --cc=jens.axboe@oracle.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vitalyb@telenet.dn.ua \
    --cc=wfg@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox