From mboxrd@z Thu Jan 1 00:00:00 1970 From: mingo@kernel.org (Ingo Molnar) Date: Fri, 18 Dec 2015 10:44:52 +0100 Subject: [PATCH 0/3] Batched user access support In-Reply-To: References: Message-ID: <20151218094452.GA32455@gmail.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org * Linus Torvalds wrote: > > So I already sent the end result of these three patches to the x86 people, > but since I *think* it may bve an arm64 issue too, I'm including the arm64 > people too for information. > > Background for the the arm64 people: I upgraded my main desktop to > Skylake, and did my usual build performance tests, including a perf run to > check that everything looks fine. Yes, the machine is 20% faster than my > old one, but the profile also shows that now that I have a CPU that > supports SMAP, the overhead of that on the user string handling functions > was horrendous. > > Normally, that probably isn't really noticeable, but on loads that do a > ton of pathname handling (like a "make -j" on the fully built kernel, or > doing "git diff" etc - both of which spend most of their time just doing > 'lstat()' on all the files they care about), the user space string > accesses really are pretty hot. > > On the 'make -j' test on a fully built kernel, strncpy_from_user() was > about 1.5% of all CPU time. And almost two thirds of that was just the > SMAP overhead. Just curious: by how much did the overhead shift after your patches? > So this patch series introduces a model for batching that SMAP overhead on > x86, and the reason the ARM people are involved is that the same _may_ be > true of the PAN overhead. I don't know - for all I know, the pstate "set > pan" instruction may be so cheap on ARM64 that it doesn't really matter. > > Thew new interface is very simple: new "unsafe_{get,put}_user()" functions > that have exactly the same semantics as the old unsafe ones (that weren't > called "unsafe", but have the two underscores). The only difference is > that you have to use "user_access_{begin,end}()" around them, which allows > the architecture to hoist the user access permission wrapper to outside > the loop, and then batch the raw accesses. > > The series contains this addition to uaccess.h: > > #ifndef user_access_begin > #define user_access_begin() do { } while (0) > #define user_access_end() do { } while (0) > #define unsafe_get_user(x, ptr) __get_user(x, ptr) > #define unsafe_put_user(x, ptr) __put_user(x, ptr) > #endif > > so architectures that don't care or haven't implemented it yet, don't need > to worry about it. Architectures that _do_ care just need to implement > their own versions, and make sure that user_access_begin is a macro (it > may obviously be an inline function and just then an additional > self-defining macro). > > Any comments? I like this interface, especially the method of naming it 'unsafe'. This draws review focus on them - and we had a number of security holes with the __get/put_user() 'fast' variants before, which are named in a too benign fashion. Btw., still today, when I use get_user()/put_user() APIs after a few weeks of not having coded such code, I have to look up their argument order in the headers ... while I don't have to look up memcpy() arguments. In hindsight, if we could go back 20 years, I'd organize our user memory access APIs in such a fashion: memcpy_begin() memcpy_user_kernel(userptr, kernelvar) memcpy_kernel_user(kernelvar, userptr) memcpy_user_kernel(userptr, kernelptr, size) memcpy_kernel_user(kernelptr, userptr, size) memcpy(kernelptr1, kernelptr2, size) memcpy_end() and I'd use the regular memcpy ordering. GCC macro magic would make a distinction between our 2-argument auto-sizing optimized APIs: memcpy_user_kernel(userptr, kernelvar) and the regular 3-argument sized memcpy: memcpy_user_kernel(userptr, kernelptr, size) i.e. generated could would still be the exact same as today, but the argument order is streamlined, and it matches the naming of the API: 'user_kernel' fits 'userptr, kernelvar'. It's also easy to read at a glance, every 'user_kernel' API conceptually maps to a regular C assignment: uservar = kernelvar; In fact we might want to harmonize the APIs a bit more and make the 2-argument APIs all pointer based: memcpy_user_kernel(userptr, kernelptr) memcpy_kernel_user(kernelptr, userptr) the pointers still have to be type compatible so I think it's still just as robust. There would be various other advantages to such a family of APIs as well: - while get/put matches the read/write IO logic, in reality it's very often mixed with memcpy and C variable assigment code, and the mixed and conflicting semantics are IMHO lethal to robustness. - with this it's all a single well-known pattern, throughout all major user memory access APIs. - subsequently it's also harder to mis-review what such code does and what security implications it has. - the naming space is easily extensible: does any code need memcpy_user_user() perhaps? Or memcpy_user_iomem()? - another dimension for intuitive extensions would be _nocache() variants for non-local copies. Variants of such APIs do exist, but the naming is not very well organized. But there are of course the disadvantages: - I guess we don't want to change thousands of get_user()/put_user() API usages ... We could automate 99.9% of it, and we could make it all graceful (i.e. keep the old APIs as well), but it would still be a big change. - [ I might have missed some advantage of the get/put idiom that my suggestion regresses. ] Changing this is something I'd love to volunteer for though, so I thought I'd throw in the offer ;-) Thanks, Ingo