From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50310) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VC3Ev-00039o-65 for qemu-devel@nongnu.org; Wed, 21 Aug 2013 03:54:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VC2mB-0002HP-OU for qemu-devel@nongnu.org; Wed, 21 Aug 2013 03:24:27 -0400 Received: from [222.73.24.84] (port=5945 helo=song.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VC2mB-0002FF-A7 for qemu-devel@nongnu.org; Wed, 21 Aug 2013 03:24:19 -0400 Message-ID: <52146AFC.1050509@cn.fujitsu.com> Date: Wed, 21 Aug 2013 15:23:40 +0800 From: Wanlong Gao MIME-Version: 1.0 References: <1376960839-13033-1-git-send-email-gaowanlong@cn.fujitsu.com> <1376960839-13033-8-git-send-email-gaowanlong@cn.fujitsu.com> <2079891071.1301559.1377006092956.JavaMail.root@redhat.com> <5214294D.9080707@cn.fujitsu.com> <791179793.1629753.1377069307325.JavaMail.root@redhat.com> In-Reply-To: <791179793.1629753.1377069307325.JavaMail.root@redhat.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=UTF-8 Subject: Re: [Qemu-devel] [PATCH V8 07/11] NUMA: set guest numa nodes memory policy Reply-To: gaowanlong@cn.fujitsu.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Andrew Jones Cc: aliguori@us.ibm.com, ehabkost@redhat.com, qemu-devel@nongnu.org, hutao@cn.fujitsu.com, peter huangpeng , lcapitulino@redhat.com, bsd@redhat.com, pbonzini@redhat.com, y-goto@jp.fujitsu.com, lersek@redhat.com, afaerber@suse.de, Wanlong Gao On 08/21/2013 03:15 PM, Andrew Jones wrote: > > > ----- Original Message ----- >> On 08/20/2013 09:41 PM, Andrew Jones wrote: >>>> + >>>> + /* This is a workaround for a long standing bug in Linux' >>>> + * mbind implementation, which cuts off the last specified >>>> + * node. To stay compatible should this bug be fixed, we >>>> + * specify one more node and zero this one out. >>>> + */ >>>> + clear_bit(numa_num_configured_nodes() + 1, numa_info[i].host_mem); >>>> + if (mbind(ram_ptr + ram_offset, len, bind_mode, >>>> + numa_info[i].host_mem, numa_num_configured_nodes() + 1, 0)) { >>>> + perror("mbind"); >>>> + return -1; >>>> + } >>> >>> >From my quick read of this patch series, I think these two calls of >>> numa_num_configured_nodes() are the only places that libnuma is used. >>> Is it really worth the new dependency? Actually libnuma will only calculate >>> what it returns from numa_num_configured_nodes() once, because it simply >>> counts bits in a bitmask that it only initializes at library load time. So >>> it would be more robust wrt to node onlining/offlining to avoid libnuma and >>> to just fetch information from sysfs as needed anyway. In this particular >>> code though, I think replacing numa_num_configured_nodes() with a maxnode, >>> where >>> >>> unsigned long maxnode = find_last_bit(numa_info[i].host_mem, >>> MAX_CPUMASK_BITS) >> >> Sorry I can't understand this since numa_numa_configured_nodes() is for host, >> but why could we find the last bit of guest setting to replace it? >> > > You're not using numa_numa_configured_nodes() to index _the_ host's nodemask, > you're using it to find the highest possible bit set in _a_ nodemask, > numa_info[i].host_mem. mbind doesn't need its 'maxnode' param to be > the highest possible host node bit, but rather just the highest possible bit > set in the nodemask passed to it. find_last_bit will find that bit. You still > need to add 1 to it as you do with numa_numa_configured_nodes() though, due > to the kernel decrementing it by one erroneously as you've pointed out in your > comment. Thank you very much for your explanation, I'll change as you said. ;) Regards, Wanlong Gao > > drew >