From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36675) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VC2dV-00064e-H4 for qemu-devel@nongnu.org; Wed, 21 Aug 2013 03:15:27 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VC2dP-0007W2-Bz for qemu-devel@nongnu.org; Wed, 21 Aug 2013 03:15:21 -0400 Received: from mx3-phx2.redhat.com ([209.132.183.24]:33217) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VC2dP-0007VX-3j for qemu-devel@nongnu.org; Wed, 21 Aug 2013 03:15:15 -0400 Date: Wed, 21 Aug 2013 03:15:07 -0400 (EDT) From: Andrew Jones Message-ID: <791179793.1629753.1377069307325.JavaMail.root@redhat.com> In-Reply-To: <5214294D.9080707@cn.fujitsu.com> References: <1376960839-13033-1-git-send-email-gaowanlong@cn.fujitsu.com> <1376960839-13033-8-git-send-email-gaowanlong@cn.fujitsu.com> <2079891071.1301559.1377006092956.JavaMail.root@redhat.com> <5214294D.9080707@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH V8 07/11] NUMA: set guest numa nodes memory policy List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: gaowanlong@cn.fujitsu.com Cc: aliguori@us.ibm.com, ehabkost@redhat.com, qemu-devel@nongnu.org, hutao@cn.fujitsu.com, peter huangpeng , lcapitulino@redhat.com, bsd@redhat.com, pbonzini@redhat.com, y-goto@jp.fujitsu.com, lersek@redhat.com, afaerber@suse.de ----- Original Message ----- > On 08/20/2013 09:41 PM, Andrew Jones wrote: > >> + > >> + /* This is a workaround for a long standing bug in Linux' > >> + * mbind implementation, which cuts off the last specified > >> + * node. To stay compatible should this bug be fixed, we > >> + * specify one more node and zero this one out. > >> + */ > >> + clear_bit(numa_num_configured_nodes() + 1, numa_info[i].host_mem); > >> + if (mbind(ram_ptr + ram_offset, len, bind_mode, > >> + numa_info[i].host_mem, numa_num_configured_nodes() + 1, 0)) { > >> + perror("mbind"); > >> + return -1; > >> + } > > > >>From my quick read of this patch series, I think these two calls of > > numa_num_configured_nodes() are the only places that libnuma is used. > > Is it really worth the new dependency? Actually libnuma will only calculate > > what it returns from numa_num_configured_nodes() once, because it simply > > counts bits in a bitmask that it only initializes at library load time. So > > it would be more robust wrt to node onlining/offlining to avoid libnuma and > > to just fetch information from sysfs as needed anyway. In this particular > > code though, I think replacing numa_num_configured_nodes() with a maxnode, > > where > > > > unsigned long maxnode = find_last_bit(numa_info[i].host_mem, > > MAX_CPUMASK_BITS) > > Sorry I can't understand this since numa_numa_configured_nodes() is for host, > but why could we find the last bit of guest setting to replace it? > You're not using numa_numa_configured_nodes() to index _the_ host's nodemask, you're using it to find the highest possible bit set in _a_ nodemask, numa_info[i].host_mem. mbind doesn't need its 'maxnode' param to be the highest possible host node bit, but rather just the highest possible bit set in the nodemask passed to it. find_last_bit will find that bit. You still need to add 1 to it as you do with numa_numa_configured_nodes() though, due to the kernel decrementing it by one erroneously as you've pointed out in your comment. drew