xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Keir Fraser <keir@xen.org>
To: Jan Beulich <JBeulich@suse.com>
Cc: TimDeegan <tim@xen.org>, Olaf Hering <olaf@aepfle.de>,
	IanCampbell <Ian.Campbell@citrix.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	George Dunlap <George.Dunlap@eu.citrix.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	George Shuklin <george.shuklin@gmail.com>,
	Dan Magenheimer <dan.magenheimer@oracle.com>,
	xen-devel@lists.xen.org, DarioFaggioli <raistlin@linux.it>,
	Kurt Hackel <kurt.hackel@oracle.com>,
	Zhigang Wang <zhigang.x.wang@oracle.com>
Subject: Re: Proposed new "memory capacity claim" hypercall/feature
Date: Thu, 08 Nov 2012 09:12:30 +0000	[thread overview]
Message-ID: <CCC127FE.51B0C%keir@xen.org> (raw)
In-Reply-To: <509B813B02000078000A7279@nat28.tlf.novell.com>

On 08/11/2012 08:54, "Jan Beulich" <JBeulich@suse.com> wrote:

>>>> On 08.11.12 at 09:18, Keir Fraser <keir.xen@gmail.com> wrote:
>> On 08/11/2012 08:00, "Jan Beulich" <JBeulich@suse.com> wrote:
>> 
>>>>>> On 07.11.12 at 23:17, Dan Magenheimer <dan.magenheimer@oracle.com> wrote:
>>>> It appears that the attempt to use 2MB and 1GB pages is done in
>>>> the toolstack, and if the hypervisor rejects it, toolstack tries
>>>> smaller pages.  Thus, if physical memory is highly fragmented
>>>> (few or no order>=9 allocations available), this will result
>>>> in one hypercall per 4k page so a 256GB domain would require
>>>> 64 million hypercalls.  And, since AFAICT, there is no sane
>>>> way to hold the heap_lock across even two hypercalls, speeding
>>>> up the in-hypervisor allocation path, by itself, will not solve
>>>> the TOCTOU race.
>>> 
>>> No, even in the absence of large pages, the tool stack will do 8M
>>> allocations, just without requesting them to be contiguous.
>>> Whether 8M is a suitable value is another aspect; that value may
>>> predate hypercall preemption, and I don't immediately see why
>>> the tool stack shouldn't be able to request larger chunks (up to
>>> the whole amount at once).
>> 
>> It is probably to allow other dom0 processing (including softirqs) to
>> preempt the toolstack task, in the case that the kernel was not built with
>> involuntary preemption enabled (having it disabled is the common case I
>> believe?). 8M batches may provide enough returns to user space to allow
>> other work to get a look-in.
> 
> That may have mattered when ioctl-s were run with the big kernel
> lock held, but even 2.6.18 didn't do that anymore (using the
> .unlocked_ioctl field of struct file_operations), which means
> that even softirqs will get serviced in Dom0 since the preempted
> hypercall gets restarted via exiting to the guest (i.e. events get
> delivered). Scheduling is what indeed wouldn't happen, but if
> allocation latency can be brought down, 8M might turn out pretty
> small a chunk size.

Ah, then I am out of date on how Linux services softirqs and preemption? Can
softirqs/preemption occur any time, even in kernel mode, so long as no locks
are held?

I thought softirq-type work only happened during event servicing, only if
the event servicing had interrupted user context (ie, would not happen if
started from within kernel mode). So the restart of the hypercall trap
instruction would be an opportunity to service hardirqs, but not softirqs or
scheduler...

 -- Keir

> If we do care about Dom0-s running even older kernels (assuming
> there ever was a privcmd implementation that didn't use the
> unlocked path), or if we have to assume non-Linux Dom0-s might
> have issues here, making the tool stack behavior kernel kind/
> version dependent without strong need of course wouldn't sound
> very attractive.
> 
> Jan
> 

  reply	other threads:[~2012-11-08  9:12 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-29 17:06 Proposed new "memory capacity claim" hypercall/feature Dan Magenheimer
2012-10-29 18:24 ` Keir Fraser
2012-10-29 21:08   ` Dan Magenheimer
2012-10-29 22:22     ` Keir Fraser
2012-10-29 23:03       ` Dan Magenheimer
2012-10-29 23:17         ` Keir Fraser
2012-10-30 15:13           ` Dan Magenheimer
2012-10-30 14:43             ` Keir Fraser
2012-10-30 16:33               ` Dan Magenheimer
2012-10-30  9:11         ` George Dunlap
2012-10-30 16:13           ` Dan Magenheimer
2012-10-29 22:35 ` Tim Deegan
2012-10-29 23:21   ` Dan Magenheimer
2012-10-30  8:13     ` Tim Deegan
2012-10-30 15:26       ` Dan Magenheimer
2012-10-30  8:29     ` Jan Beulich
2012-10-30 15:43       ` Dan Magenheimer
2012-10-30 16:04         ` Jan Beulich
2012-10-30 17:13           ` Dan Magenheimer
2012-10-31  8:14             ` Jan Beulich
2012-10-31 16:04               ` Dan Magenheimer
2012-10-31 16:19                 ` Jan Beulich
2012-10-31 16:51                   ` Dan Magenheimer
2012-11-02  9:01                     ` Jan Beulich
2012-11-02  9:30                       ` Keir Fraser
2012-11-04 19:43                         ` Dan Magenheimer
2012-11-04 20:35                           ` Tim Deegan
2012-11-05  0:23                             ` Dan Magenheimer
2012-11-05 10:29                               ` Ian Campbell
2012-11-05 14:54                                 ` Dan Magenheimer
2012-11-05 22:24                                   ` Ian Campbell
2012-11-05 22:58                                     ` Zhigang Wang
2012-11-05 22:58                                     ` Dan Magenheimer
2012-11-06 13:23                                       ` Ian Campbell
2012-11-05 22:33                             ` Dan Magenheimer
2012-11-06 10:49                               ` Jan Beulich
2012-11-05  9:16                           ` Jan Beulich
2012-11-07 22:17                             ` Dan Magenheimer
2012-11-08  7:36                               ` Keir Fraser
2012-11-08 10:11                                 ` Ian Jackson
2012-11-08 10:57                                   ` Keir Fraser
2012-11-08 21:45                                   ` Dan Magenheimer
2012-11-12 11:03                                     ` Ian Jackson
2012-11-08  8:00                               ` Jan Beulich
2012-11-08  8:18                                 ` Keir Fraser
2012-11-08  8:54                                   ` Jan Beulich
2012-11-08  9:12                                     ` Keir Fraser [this message]
2012-11-08  9:47                                       ` Jan Beulich
2012-11-08 10:50                                         ` Keir Fraser
2012-11-08 13:48                                           ` Jan Beulich
2012-11-08 19:16                                             ` Dan Magenheimer
2012-11-08 22:32                                               ` Keir Fraser
2012-11-09  8:47                                               ` Jan Beulich
2012-11-08 18:38                                 ` Dan Magenheimer
2012-11-05 17:14         ` George Dunlap
2012-11-05 18:21           ` Dan Magenheimer
2012-11-01  2:13   ` Dario Faggioli
2012-11-01 15:51     ` Dan Magenheimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CCC127FE.51B0C%keir@xen.org \
    --to=keir@xen.org \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=dan.magenheimer@oracle.com \
    --cc=george.shuklin@gmail.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kurt.hackel@oracle.com \
    --cc=olaf@aepfle.de \
    --cc=raistlin@linux.it \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xen.org \
    --cc=zhigang.x.wang@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).