From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kamezawa Hiroyuki Subject: pci device assignment and mm, KSM. Date: Wed, 04 Jul 2012 15:16:37 +0900 Message-ID: <4FF3DFC5.2010409@jp.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org, Hugh Dickins , KOSAKI Motohiro , Minchan Kim To: linux-mm Return-path: Received: from fgwmail6.fujitsu.co.jp ([192.51.44.36]:57144 "EHLO fgwmail6.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932357Ab2GDGSv (ORCPT ); Wed, 4 Jul 2012 02:18:51 -0400 Received: from m4.gw.fujitsu.co.jp (unknown [10.0.50.74]) by fgwmail6.fujitsu.co.jp (Postfix) with ESMTP id 55BE73EE0B5 for ; Wed, 4 Jul 2012 15:18:50 +0900 (JST) Received: from smail (m4 [127.0.0.1]) by outgoing.m4.gw.fujitsu.co.jp (Postfix) with ESMTP id 30F2645DE53 for ; Wed, 4 Jul 2012 15:18:50 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (s4.gw.fujitsu.co.jp [10.0.50.94]) by m4.gw.fujitsu.co.jp (Postfix) with ESMTP id 1774145DE50 for ; Wed, 4 Jul 2012 15:18:50 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 01BBDE08006 for ; Wed, 4 Jul 2012 15:18:50 +0900 (JST) Received: from ml14.s.css.fujitsu.com (ml14.s.css.fujitsu.com [10.240.81.134]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id A6E2FE08002 for ; Wed, 4 Jul 2012 15:18:49 +0900 (JST) Sender: kvm-owner@vger.kernel.org List-ID: I'm sorry if my understanding is incorrect. Here are some topics on pci passthrough to guests. When pci passthrough is used with kvm, guest's all memory are pinned by extra reference count of get_page(). That pinned pages are never be reclaimable and movable by migration and cannot be merged by KSM. Now, the information that 'the page is pinned by kvm' is just represented by page_count(). So, there are following problems. a) pages are on ANON_LRU. So, try_to_free_page() and kswapd will scan XX GB of pages hopelessly. b) KSM cannot recognize the pages in its early stage. So, it breaks transparent huge page mapped by kvm into small pages. But it fails to merge them finally, because of raised page_count(). So, all hugepages are split without any benefits. 2 ideas for fixing this.... for a) I guess the pages should go to UNEVICTABLE list. But it's not mlocked. I think we use PagePinned() instread of it and move pages to UNEVICTABLE list. Then, kswapd etc will ignore pinned pages. for b) At first, I thought qemu should call madvise(MADV_UNMERGEABLE). But I think kernel may be able to handle situation with an extra check, PagePinned() or checking a flag in mm_struct. Should we avoid this in userland or kernel ? BTW, I think pinned pages cannot be freed until the kvm process exits. Is it right ? Thanks, -Kame