From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTP id 1D30FDDE43 for ; Sun, 11 Feb 2007 08:41:10 +1100 (EST) Subject: Re: Discussion about iopa() From: Benjamin Herrenschmidt To: Dan Malek In-Reply-To: <6CDAEEF1-B0ED-42E6-AA2C-6FD1CFCF462C@embeddedalley.com> References: <989B956029373F45A0B8AF02970818900D444B@zch01exm26.fsl.freescale.net> <45CB28A6.3050607@freescale.com> <712E63F6-23D6-45EB-92F0-95656FF38BC4@embeddedalley.com> <1171075021.20494.0.camel@localhost.localdomain> <6CDAEEF1-B0ED-42E6-AA2C-6FD1CFCF462C@embeddedalley.com> Content-Type: text/plain Date: Sun, 11 Feb 2007 08:40:49 +1100 Message-Id: <1171143649.20494.31.camel@localhost.localdomain> Mime-Version: 1.0 Cc: linuxppc-dev list , Timur Tabi List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Sat, 2007-02-10 at 13:04 -0500, Dan Malek wrote: > On Feb 9, 2007, at 9:37 PM, Benjamin Herrenschmidt wrote: > > > We are fairly careful about not bloating fast path in general. > > This isn't any fast path code, and the way the > exception handlers are growing it doesn't > seem to be a concern anyway. The 64 bits exception handlers are growing a bit due to some optional process time accounting stuff though I'm not too happy with the growth. Appart from that, they aren't growing much and we are working hard to keep them in check. Any specific example of that "growth" you are talking about ? > It is only a couple of memory accesses, even > less code than the TLB exception handlers. It's more specifically two loads on 2 level page tables we have on 32 bits though on 64 bits, page tables are 3 or 4 levels (64k or 4k page size) and thus it's 3 or 4 loads, which can be very significant if those are cache misses. So yes, while it's quite cheap on embedded 32 bits CPUs that don't use HIGHMEM, it's not that good on other things, and thus might not be the best approach. I still think that it's preferable to simply obtain the physical address along with the virtual one when allocating/creating an object (and thus have the allocator for those object types, like rheap for MURAM, return it, the same way the coherent dma allocator does). There's also another issue with iopa that isn't obvious at first look: It's racy vs. page tables being disposed on SMP machines (and possibly with preempt). We handle the race against hash misses on hash-based CPUs using the hash lock in pte_free but there is nothing in iopa to deal with that. I don't think this is a problem with kernel mappings though, but one should be careful. > Using highmem has a price any time it's > configured into a system, it's not unique in > this case. In fact, in this case highmem > shouldn't be a concern any different than > the TLB exceptions. True, but it's more expensive than keeping track of the physical address from allocation. > I just don't understand how such a trivial > and useful function that does exactly what > we need in a very clean way generates so > much polarized discussion. I'm beginning > to think it's just personal, since the only > argument against it is "I don't like it" when > the alternatives are just hacks at best that > still need to be "fixed up someday." :-) The alternatives aren't just hack. The alternative that we recommend and which is the way to do things in linux is to keep track of the physical address or the struct page's at allocation time. > The Linux VM implementation just sucks. This has very little to do with linux VM. Most if not all the uses of iopa are purely for kernel mappings which are not handled by the core VM in most areas. There are design choices in the linux kernel memory management that you might not agree with, though just saying "just sucks" is neither useful nor constructive. If you think that some aspects of linux kernel memory handling should be done differently, you are much welcome to propose alternatives (with patches) though keep in mind that the way things are done now is actually very efficient from a performance standpoint and well adapted to the need of most architectures. > The majority of systems running this software > aren't servers and desktop PCs, it's embedded > SOCs with application specific peripherals. > They have attributes and are mapped in ways > that don't fit the "memory at 0" or "IO" model. > We have to find solutions to this, together. Yes, and finding solutions involves more than just saying "sucks" :-) Now, I don't completely agree with you that there are "fundamental" limitations in the way memory is managed. First let's get off the subject of "VM" as this is commonly used to represent the memory management of user processes which isn't what we are talking about (and doesnt' suffer from any of the "limitations" you mention anyway. What we are talking about here is the management of the kernel memory address space. Some of the limitations you mention above like "memory at 0" are more limitations of some architecture ports like x86 or powerpc and mostly because on those CPUs, it make little sense to do differently due to the way exceptions work, and even then, they aren't very hard to lift. (See for example kdump which runs the ppc64 kernel in a reserved area of memory at 32MB or so). Some of those "limitations" result from design choices that provide best performances for 99% of the usage scenario and though they might not be suitable for the most exotic hardware, that doesn't make them bad choices in the first place. I do agree however that in some areas we can do better. For example, the vmalloc/ioremap allocator should definitely be modified to be able to allocate in different areas/pools. That's something we could use on ppc64 as well to replace our imalloc crap and possibly on embedded to replace rheap. I'm not sure what you mean by "IO model" here. Ben.