From: Avi Kivity <avi@redhat.com>
To: balbir@linux.vnet.ibm.com
Cc: Christoph Hellwig <hch@lst.de>, Chris Webb <chris@arachsys.com>,
KVM development list <kvm@vger.kernel.org>,
Rik van Riel <riel@surriel.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Kevin Wolf <kwolf@redhat.com>
Subject: Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
Date: Tue, 16 Mar 2010 17:59:40 +0200 [thread overview]
Message-ID: <4B9FAAEC.1040604@redhat.com> (raw)
In-Reply-To: <20100316142739.GM18054@balbir.in.ibm.com>
On 03/16/2010 04:27 PM, Balbir Singh wrote:
>
>> Let's assume the guest has virtio (I agree with IDE we need
>> reordering on the host). The guest sends batches of I/O separated
>> by cache flushes. If the batches are smaller than the virtio queue
>> length, ideally things look like:
>>
>> io_submit(..., batch_size_1);
>> io_getevents(..., batch_size_1);
>> fdatasync();
>> io_submit(..., batch_size_2);
>> io_getevents(..., batch_size_2);
>> fdatasync();
>> io_submit(..., batch_size_3);
>> io_getevents(..., batch_size_3);
>> fdatasync();
>>
>> (certainly that won't happen today, but it could in principle).
>>
>> How does a write cache give any advantage? The host kernel sees
>> _exactly_ the same information as it would from a bunch of threaded
>> pwritev()s followed by fdatasync().
>>
>>
> Are you suggesting that the model with cache=writeback gives us the
> same I/O pattern as cache=none, so there are no opportunities for
> optimization?
>
Yes. The guest also has a large cache with the same optimization algorithm.
>
>
>> (wish: IO_CMD_ORDERED_FDATASYNC)
>>
>> If the batch size is larger than the virtio queue size, or if there
>> are no flushes at all, then yes the huge write cache gives more
>> opportunity for reordering. But we're already talking hundreds of
>> requests here.
>>
>> Let's say the virtio queue size was unlimited. What
>> merging/reordering opportunity are we missing on the host? Again we
>> have exactly the same information: either the pagecache lru + radix
>> tree that identifies all dirty pages in disk order, or the block
>> queue with pending requests that contains exactly the same
>> information.
>>
>> Something is wrong. Maybe it's my understanding, but on the other
>> hand it may be a piece of kernel code.
>>
>>
> I assume you are talking of dedicated disk partitions and not
> individual disk images residing on the same partition.
>
Correct. Images in files introduce new writes which can be optimized.
--
error compiling committee.c: too many arguments to function
WARNING: multiple messages have this Message-ID (diff)
From: Avi Kivity <avi@redhat.com>
To: balbir@linux.vnet.ibm.com
Cc: Christoph Hellwig <hch@lst.de>, Chris Webb <chris@arachsys.com>,
KVM development list <kvm@vger.kernel.org>,
Rik van Riel <riel@surriel.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Kevin Wolf <kwolf@redhat.com>
Subject: Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
Date: Tue, 16 Mar 2010 17:59:40 +0200 [thread overview]
Message-ID: <4B9FAAEC.1040604@redhat.com> (raw)
In-Reply-To: <20100316142739.GM18054@balbir.in.ibm.com>
On 03/16/2010 04:27 PM, Balbir Singh wrote:
>
>> Let's assume the guest has virtio (I agree with IDE we need
>> reordering on the host). The guest sends batches of I/O separated
>> by cache flushes. If the batches are smaller than the virtio queue
>> length, ideally things look like:
>>
>> io_submit(..., batch_size_1);
>> io_getevents(..., batch_size_1);
>> fdatasync();
>> io_submit(..., batch_size_2);
>> io_getevents(..., batch_size_2);
>> fdatasync();
>> io_submit(..., batch_size_3);
>> io_getevents(..., batch_size_3);
>> fdatasync();
>>
>> (certainly that won't happen today, but it could in principle).
>>
>> How does a write cache give any advantage? The host kernel sees
>> _exactly_ the same information as it would from a bunch of threaded
>> pwritev()s followed by fdatasync().
>>
>>
> Are you suggesting that the model with cache=writeback gives us the
> same I/O pattern as cache=none, so there are no opportunities for
> optimization?
>
Yes. The guest also has a large cache with the same optimization algorithm.
>
>
>> (wish: IO_CMD_ORDERED_FDATASYNC)
>>
>> If the batch size is larger than the virtio queue size, or if there
>> are no flushes at all, then yes the huge write cache gives more
>> opportunity for reordering. But we're already talking hundreds of
>> requests here.
>>
>> Let's say the virtio queue size was unlimited. What
>> merging/reordering opportunity are we missing on the host? Again we
>> have exactly the same information: either the pagecache lru + radix
>> tree that identifies all dirty pages in disk order, or the block
>> queue with pending requests that contains exactly the same
>> information.
>>
>> Something is wrong. Maybe it's my understanding, but on the other
>> hand it may be a piece of kernel code.
>>
>>
> I assume you are talking of dedicated disk partitions and not
> individual disk images residing on the same partition.
>
Correct. Images in files introduce new writes which can be optimized.
--
error compiling committee.c: too many arguments to function
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-03-16 16:01 UTC|newest]
Thread overview: 98+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-15 7:22 [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter Balbir Singh
2010-03-15 7:22 ` Balbir Singh
2010-03-15 7:48 ` Avi Kivity
2010-03-15 7:48 ` Avi Kivity
2010-03-15 8:07 ` Balbir Singh
2010-03-15 8:07 ` Balbir Singh
2010-03-15 8:27 ` Avi Kivity
2010-03-15 8:27 ` Avi Kivity
2010-03-15 9:17 ` Balbir Singh
2010-03-15 9:17 ` Balbir Singh
2010-03-15 9:27 ` Avi Kivity
2010-03-15 9:27 ` Avi Kivity
2010-03-15 10:45 ` Balbir Singh
2010-03-15 10:45 ` Balbir Singh
2010-03-15 18:48 ` Anthony Liguori
2010-03-15 18:48 ` Anthony Liguori
2010-03-16 9:05 ` Avi Kivity
2010-03-16 9:05 ` Avi Kivity
2010-03-19 7:23 ` Dave Hansen
2010-03-19 7:23 ` Dave Hansen
2010-03-15 20:23 ` Chris Webb
2010-03-15 20:23 ` Chris Webb
2010-03-15 23:43 ` Anthony Liguori
2010-03-15 23:43 ` Anthony Liguori
2010-03-16 0:43 ` Christoph Hellwig
2010-03-16 0:43 ` Christoph Hellwig
2010-03-16 1:27 ` Anthony Liguori
2010-03-16 1:27 ` Anthony Liguori
2010-03-16 8:19 ` Christoph Hellwig
2010-03-16 8:19 ` Christoph Hellwig
2010-03-17 15:14 ` Chris Webb
2010-03-17 15:14 ` Chris Webb
2010-03-17 15:55 ` Anthony Liguori
2010-03-17 15:55 ` Anthony Liguori
2010-03-17 16:27 ` Chris Webb
2010-03-17 16:27 ` Chris Webb
2010-03-22 21:04 ` Chris Webb
2010-03-22 21:04 ` Chris Webb
2010-03-22 21:07 ` Avi Kivity
2010-03-22 21:07 ` Avi Kivity
2010-03-22 21:10 ` Chris Webb
2010-03-22 21:10 ` Chris Webb
2010-03-17 16:27 ` Balbir Singh
2010-03-17 16:27 ` Balbir Singh
2010-03-17 17:05 ` Vivek Goyal
2010-03-17 17:05 ` Vivek Goyal
2010-03-17 19:11 ` Chris Webb
2010-03-17 19:11 ` Chris Webb
2010-03-16 3:16 ` Balbir Singh
2010-03-16 3:16 ` Balbir Singh
2010-03-16 9:17 ` Avi Kivity
2010-03-16 9:17 ` Avi Kivity
2010-03-16 9:54 ` Kevin Wolf
2010-03-16 9:54 ` Kevin Wolf
2010-03-16 10:16 ` Avi Kivity
2010-03-16 10:16 ` Avi Kivity
2010-03-16 10:26 ` Christoph Hellwig
2010-03-16 10:26 ` Christoph Hellwig
2010-03-16 10:36 ` Avi Kivity
2010-03-16 10:36 ` Avi Kivity
2010-03-16 10:44 ` Christoph Hellwig
2010-03-16 10:44 ` Christoph Hellwig
2010-03-16 11:08 ` Avi Kivity
2010-03-16 11:08 ` Avi Kivity
2010-03-16 14:27 ` Balbir Singh
2010-03-16 14:27 ` Balbir Singh
2010-03-16 15:59 ` Avi Kivity [this message]
2010-03-16 15:59 ` Avi Kivity
2010-03-17 8:49 ` Christoph Hellwig
2010-03-17 8:49 ` Christoph Hellwig
2010-03-17 9:10 ` Avi Kivity
2010-03-17 9:10 ` Avi Kivity
2010-03-17 15:24 ` Chris Webb
2010-03-17 15:24 ` Chris Webb
2010-03-17 16:22 ` Avi Kivity
2010-03-17 16:22 ` Avi Kivity
2010-03-17 16:40 ` Avi Kivity
2010-03-17 16:40 ` Avi Kivity
2010-03-17 16:47 ` Chris Webb
2010-03-17 16:47 ` Chris Webb
2010-03-17 16:53 ` Avi Kivity
2010-03-17 16:53 ` Avi Kivity
2010-03-17 16:58 ` Christoph Hellwig
2010-03-17 16:58 ` Christoph Hellwig
2010-03-17 17:03 ` Avi Kivity
2010-03-17 17:03 ` Avi Kivity
2010-03-17 16:57 ` Christoph Hellwig
2010-03-17 16:57 ` Christoph Hellwig
2010-03-17 17:06 ` Avi Kivity
2010-03-17 17:06 ` Avi Kivity
2010-03-17 16:52 ` Christoph Hellwig
2010-03-17 16:52 ` Christoph Hellwig
2010-03-17 17:02 ` Avi Kivity
2010-03-17 17:02 ` Avi Kivity
2010-03-15 15:46 ` Randy Dunlap
2010-03-15 15:46 ` Randy Dunlap
2010-03-16 3:21 ` Balbir Singh
2010-03-16 3:21 ` Balbir Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B9FAAEC.1040604@redhat.com \
--to=avi@redhat.com \
--cc=balbir@linux.vnet.ibm.com \
--cc=chris@arachsys.com \
--cc=hch@lst.de \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kvm@vger.kernel.org \
--cc=kwolf@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=riel@surriel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.