From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753943AbYDDFdh (ORCPT ); Fri, 4 Apr 2008 01:33:37 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751597AbYDDFd2 (ORCPT ); Fri, 4 Apr 2008 01:33:28 -0400 Received: from gw.goop.org ([64.81.55.164]:47108 "EHLO mail.goop.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751532AbYDDFd1 (ORCPT ); Fri, 4 Apr 2008 01:33:27 -0400 Message-ID: <47F5BD6C.30100@goop.org> Date: Thu, 03 Apr 2008 22:32:28 -0700 From: Jeremy Fitzhardinge User-Agent: Thunderbird 2.0.0.12 (X11/20080315) MIME-Version: 1.0 To: Dave Hansen CC: KAMEZAWA Hiroyuki , Yasunori Goto , Ingo Molnar , LKML , Christoph Lameter Subject: Re: [PATCH 5 of 6] hotplug-memory: add section_ops References: <785fe0877fea0f488bc5.1207267545@localhost> <1207270272.943.27.camel@nimitz.home.sr71.net> <47F5809A.7040307@goop.org> <1207273978.943.82.camel@nimitz.home.sr71.net> In-Reply-To: <1207273978.943.82.camel@nimitz.home.sr71.net> X-Enigmail-Version: 0.95.6 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Dave Hansen wrote: > On Thu, 2008-04-03 at 18:12 -0700, Jeremy Fitzhardinge wrote: > >> Dave Hansen wrote: >> >>> On Thu, 2008-04-03 at 17:05 -0700, Jeremy Fitzhardinge wrote: >>> I think it might just be nicer to have a global list of these handlers >>> somewhere. The Xen driver can just say "put me on the list of >>> callbacks" and we'll call them at online_page(). I really don't think >>> we need to be passing an ops structure around. >>> >>> >> Yes, but it seems a bit awkward. If we assume that: >> >> 1. Xen will be the only user of the hook, and >> 2. Xen-balloon hotplug is exclusive of real memory hotplug >> >> then I guess its reasonable (though if that's the case it would be >> simpler to just put a direct call under #ifdef CONFIG_XEN_BALLOON in there). >> > > Yeah, I'm OK with something along those lines, too. I'd prefer sticking > some stubs in a header and putting the #ifdef there, if only for > aesthetic reasons. > Sure. But I think its a very non-scalable approach; as soon as there's a second user who wants to do something like this, its worth going to a more ops or function-pointer path. >> But if we think there can be multiple callbacks, and they all get called >> on the online of each page, and there can be multiple kinds of hotplug >> memory it gets pretty messy. Each has to determine "why was I called on >> this page?" and you'd to work out which one actually does the job of >> onlining. It just seems cleaner to say "this section needs to be >> onlined like this", and there's no ambiguity. >> > > I really wish we'd stop calling it "page online". :) > > Let me think out loud for a sec here. Here's how memory hotplug works > in a nutshell: > > First step (add_memory() or probe time): > 1. get more memory made available > 2. create kva mapping for that memory (for lowmem) > 3. allocate 'struct pages' > > Second step, 'echo 1 > .../memoryXXX/online' time: > 4. modify zone/pgdat spans (make the VM account for the memory) > 5. Initialize the 'struct page' > 6. free the memory into the buddy allocator > > You can't do (2) because Xen doesn't allow mappings to be created until > real backing is there. You've already done this, right? > Well, it hasn't been an issue so far, because I've only been working with x86-32 where all hotplug memory is highmem. But, yes, on x86-64 and other architectures it would have to defer creating the mappings until it gets the pages (page by page). > You don't want to do (6) either, because there is no mapping for the > page and it isn't committed in hardware, yet, so you don't want someone > grabbing it *out* of the buddy allocator and using it. > Right. I do that page by page in the balloon driver; each time I get a machine page, I bind it to the corresponding page structure and free it into the allocator. And I skip the whole "echo online > /sys..." part, because its redundant: the use of hotplug memory is an internal implementation detail of the balloon driver, which users needn't know about when they deal with the balloon driver. > Your solution is to take those first 1-5 steps, and had the balloon > driver call them directly. The online_page() modifications are because > 5/6 are a bit intertwined right now. Are we on the same page so far? > Right. > Why don't we just free the page back into the balloon driver? Take the > existing steps 1-5, use them as they stand today, and just chop of step > 6 for Xen. It'd save a bunch of this code churn and also stop us from > proliferating any kind of per-config section-online behavior like you're > asking about above. > That's more or less what I've done. I've grouped it as 1-4 when the balloon driver decides it needs more page structures, then 5&6 page by page when it actually gets some backing memory. > That might also be generalizable to anyone else that wants the "fresh > meat" newly-hotplugged memory. Large page users might be other > consumers here. > Sure. The main problem is that 1-3 also ends up implicitly registering the new section with sysfs, so the bulk online interface becomes to usermode. If we make that optional (ie, a separate explicit call the memory-adder can choose to take) then everything is rosy. >> I'm already anticipating using the ops mechanism to support another >> class of Xen hotplug memory for managing large pages. >> > > Do tell. :) > I think I mentioned it before. If we 1) modify Xen to manage domain memory in large pages, 2) have a reasonably small section size, then we can reasonably do all memory management directly via the hotplug interface. Bringing each (large) page online would still require some explicit action, but it would be a much closer fit to how the hotplug machinery currently works. Then a small user or kernel mode policy daemon could use it to replicate the existing balloon driver's functionality. J