From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758044Ab2AKURP (ORCPT ); Wed, 11 Jan 2012 15:17:15 -0500 Received: from out02.mta.xmission.com ([166.70.13.232]:52524 "EHLO out02.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757543Ab2AKURL convert rfc822-to-8bit (ORCPT ); Wed, 11 Jan 2012 15:17:11 -0500 From: ebiederm@xmission.com (Eric W. Biederman) To: KOSAKI Motohiro Cc: Cyrill Gorcunov , Pavel Emelyanov , LKML , Andrew Morton , Kyle Moffett , Tejun Heo , Glauber Costa , Andi Kleen , Matt Helsley , Pekka Enberg , Eric Dumazet , Vasiliy Kulikov , Alexey Dobriyan , Herbert Xu , "David S. Miller" , Andrey Vagin Subject: Re: [RFC] on general object IDs again References: <20120111161939.GI8752@moon> <20120111175952.GI466@moon> <4F0DD365.6070200@parallels.com> <20120111183115.GA28196@moon> Date: Wed, 11 Jan 2012 12:19:23 -0800 In-Reply-To: (KOSAKI Motohiro's message of "Wed, 11 Jan 2012 14:29:48 -0500") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-XM-SPF: eid=;;;mid=;;;hst=in01.mta.xmission.com;;;ip=98.207.153.68;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1/1jWZS0ZMHudzrWNdDkg3JE6h7ZBN6ktA= X-SA-Exim-Connect-IP: 98.207.153.68 X-SA-Exim-Mail-From: ebiederm@xmission.com X-SA-Exim-Scanned: No (on in01.mta.xmission.com); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org KOSAKI Motohiro writes: >>> > Then, you only need to compare. not any other calculation. i.e. only >>> > need id uniqueness. >>> > And any resource are referenced from tasks. so, can you reuse pid for >>> > this? example, >>> > two taska share one mm. >>> > >>> > task-a(pid: 100) >>> >               |-----------------mm >>> > task-b(pid: 200) >>> > >>> > >>> > gen_obj_id(task-b, GEN_OBJ_ID_VM) return 100. (youngest pid of referenced tasks) >>> >>> We can, but determining the youngest pid for an mm struct is O(N) algo. >>> Having N tasks with N mm_structs getting the sharing picture becomes O(N^2). >> >> Yeah, exactly. If not the speed problem we would simply stick >> with Andrew's proposal as two-id-are-the-same(pid1, pid2) >> syscall. > > Why O(N^2) is matter? Typical HPC system have mere a few hundred pids. > so, O(N^2) > is not slow. How do you mesure Andrew's proposal? > > If you have 1000 pids and each syscall need 10usec, > > 1000 * 1000 * 10 = 10,000,000usec = 10sec. But, important thing is, almost all > processes don't share fs, mm and other structs. then, if we check > reference count > before task traversal, required time may reduce 1/10x - 1/100x. A couple of things. 1) This is generally useful debug information so if we can we should design something that is enabled everywhere. 2) Something like avoiding a O(N^2) check is easy. Just store the pid when we create the structure. 3) Which leads to the very obvious question. Why don't we just assign an arbitrary id to all of the interesting structures? The code to do that is trivial and the implementation is secure in a way that hashing pointers never will be. I thought for a little bit that struct file might be too light weight for that to make sense but on second glance struct file is fat enough that I think we could squeeze in another 64 bits without noticeably bloating the structure. At a first case things like the mm struct really aren't interesting. 99+% of the time the sharing is well know and running around discovering it is just a waste of time. It is struct file that is really interesting. Even on the most trivial unix system struct file is shared between multiple processes and multiple open file descriptors in the same process. And it is struct file where there are a lot of them that the performance difference matters. .... Hmm. Come to think of it I don't think an object comparison system call is half as worthless as it has been made out to be. The justification for not doing an object comparison system call is that using an object comparison system call will result in a O(N^2) algorithm. For file descriptors we can avoid most of that by comparing the file the file descriptor points to, and the offset of the file descriptor. So we should not need an all to all comparison. In the second place the only reason why we would need O(N^2) complexity instead of O(NlogN) is if the comparison system call only compares for identity instead of returning a result that allows us to order the objects we are worrying about. Now the only possible problem with comparing pointers and returning the objects in memory order is that some evil attacker might be able to use that to their advantage. However the in memory order is generally derivable from the creation order of the objects so I don't think it is worth worry about at this point. Furthermore we can preserve the order of objects across a checkpoint by carefully controlling the order that the objects are created when we restore the system. Part of me wants to say: go assign an id to everything that is interesting, but now that I have thought through the comparison system call it is an obvious and trivial solution with no practical impact on the rest of the kernel. Guys please go back and implement a comparison system call that orders the objects and does not just compare for identity. Eric