All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Daniel Stodden <daniel.stodden@citrix.com>
Cc: "Xen-devel@lists.xensource.com" <Xen-devel@lists.xensource.com>
Subject: Re: blktap: Sync with XCP, dropping zero-copy.
Date: Wed, 17 Nov 2010 10:00:51 -0800	[thread overview]
Message-ID: <4CE41853.1010000@goop.org> (raw)
In-Reply-To: <1289942932.11102.802.camel@agari.van.xensource.com>

On 11/16/2010 01:28 PM, Daniel Stodden wrote:
>> What's the problem?  If you do nothing then it will appear to the kernel
>> as a bunch of processes doing memory allocations, and they'll get
>> blocked/rate-limited accordingly if memory is getting short.  
> The problem is that just letting the page allocator work through
> allocations isn't going to scale anywhere.
>
> The worst case memory requested under load is <number-of-disks> * (32 *
> 11 pages). As a (conservative) rule of thumb, N will be 200 or rather
> better.

Under what circumstances would you end up needing to allocate that many
pages?

> The number of I/O actually in-flight at any point, in contrast, is
> derived from the queue/sg sizes of the physical device. For a simple
> disk, that's about a ring or two.

Wouldn't that be the worst case?

>> There's
>> plenty of existing mechanisms to control that sort of thing (cgroups,
>> etc) without adding anything new to the kernel.  Or are you talking
>> about something other than simple memory pressure?
>>
>> And there's plenty of existing IPC mechanisms if you want them to
>> explicitly coordinate with each other, but I'd tend to thing that's
>> premature unless you have something specific in mind.
>>
>>> Also, I was absolutely certain I once saw VM_FOREIGN support in gntdev..
>>> Can't find it now, what happened? Without, there's presently still no
>>> zero-copy.
>> gntdev doesn't need VM_FOREIGN any more - it uses the (relatively
>> new-ish) mmu notifier infrastructure which is intended to allow a device
>> to sync an external MMU with usermode mappings.  We're not using it in
>> precisely that way, but it allows us to wrangle grant mappings before
>> the generic code tries to do normal pte ops on them.
> The mmu notifiers were for safe teardown only. They are not sufficient
> for DIO, which wants gup() to work. If you want zcopy on gntdev, we'll
> need to back those VMAs with page structs.

The pages will have struct page, because they're normal kernel pages
which happen to be backed by mapped granted pages.  Are you talking
about the #ifdef CONFIG_XEN code in the middle of __get_user_pages()? 
Isn't that just there to cope with the nested-IO-on-the-same-page
problem that the current blktap architecture provokes?  If there's only
a single IO on each page - the one initiated by usermode - then it
shouldn't be necessary, right?

>   Or bounce again (gulp, just
> mentioning it). As with the blktap2 patches, note there is no difference
> in the dom0 memory bill, it takes page frames.

(And perhaps actual pages to substitute for the granted pages.)

> I guess we've been meaning the same thing here, unless I'm
> misunderstanding you. Any pfn does, and the balloon pagevec allocations
> default to order 0 entries indeed. Sorry, you're right, that's not a
> 'range'. With a pending re-xmit, the backend can find a couple (or all)
> of the request frames have count>1. It can flip and abandon those as
> normal memory. But it will need those lost memory slots back, straight
> away or next time it's running out of frames. As order-0 allocations.

Right.  GFP_KERNEL order 0 allocations are pretty reliable; they only
fail if the system is under extreme memory pressure.  And it has the
nice property that if those allocations block or fail it rate limits IO
ingress from domains rather than being crushed by memory pressure at the
backend (ie, the problem with trying to allocate memory in the writeout
path).

Also the cgroup mechanism looks like an extremely powerful way to
control the allocations for a process or group of processes to stop them
from dominating the whole machine.

    J

  parent reply	other threads:[~2010-11-17 18:00 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-12 23:31 blktap: Sync with XCP, dropping zero-copy Daniel Stodden
2010-11-12 23:31 ` [PATCH 1/5] blktap: Manage segment buffers in mempools Daniel Stodden
2010-11-12 23:31 ` [PATCH 2/5] blktap: Make VMAs non-foreign and bounce buffered Daniel Stodden
2010-11-12 23:31 ` [PATCH 3/5] blktap: Add queue access macros Daniel Stodden
2010-11-12 23:31 ` [PATCH 4/5] blktap: Forward port to 2.6.32 Daniel Stodden
2010-11-12 23:31 ` [PATCH 5/5] Fix compilation format warning in drivers/xen/blktap/device.c Daniel Stodden
2010-11-13  0:50 ` blktap: Sync with XCP, dropping zero-copy Jeremy Fitzhardinge
2010-11-13  3:56   ` Daniel Stodden
     [not found]   ` <1289620544.11102.373.camel@agari.van.xensource.com>
2010-11-15 18:27     ` Jeremy Fitzhardinge
2010-11-15 19:19       ` Ian Campbell
2010-11-15 19:34         ` Jeremy Fitzhardinge
2010-11-15 20:07           ` Ian Campbell
2010-11-16  0:43             ` Daniel Stodden
2010-11-16  9:13       ` Daniel Stodden
2010-11-16 12:17         ` Stefano Stabellini
2010-11-16 16:11           ` Konrad Rzeszutek Wilk
2010-11-16 16:16             ` Stefano Stabellini
2010-11-17  2:40           ` Daniel Stodden
2010-11-17 12:35             ` Stefano Stabellini
2010-11-17 15:34               ` Jonathan Ludlam
2010-11-16 13:00         ` Dave Scott
2010-11-16 14:48           ` Stefano Stabellini
2010-11-16 17:56         ` Jeremy Fitzhardinge
2010-11-16 21:28           ` Daniel Stodden
2010-11-17 17:04             ` Ian Campbell
2010-11-17 19:27               ` Daniel Stodden
2010-11-18 13:56                 ` Ian Campbell
2010-11-18 19:37                   ` Daniel Stodden
2010-11-19 10:57                     ` Ian Campbell
2010-11-17 18:00             ` Jeremy Fitzhardinge [this message]
2010-11-17 20:21               ` Daniel Stodden
2010-11-17 21:02                 ` Jeremy Fitzhardinge
2010-11-17 21:57                   ` Daniel Stodden
2010-11-17 22:14                     ` Jeremy Fitzhardinge
     [not found]                       ` <1290035201.11102.1577.camel@agari.van.xensource.com>
     [not found]                         ` <4CE46A03.3010104@goop.org>
     [not found]                           ` <1290040898.11102.1709.camel@agari.van.xensource.com>
2010-11-18  2:29                             ` Jeremy Fitzhardinge
2010-11-17 23:32                     ` Daniel Stodden
     [not found] <20101116215621.59FC2CF782@homiemail-mx7.g.dreamhost.com>
2010-11-17 16:36 ` Andres Lagar-Cavilla
2010-11-17 17:52   ` Jeremy Fitzhardinge
2010-11-17 19:47     ` Andres Lagar-Cavilla
2010-11-17 23:42   ` Daniel Stodden

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CE41853.1010000@goop.org \
    --to=jeremy@goop.org \
    --cc=Xen-devel@lists.xensource.com \
    --cc=daniel.stodden@citrix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.