From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752694Ab2HOW4I (ORCPT ); Wed, 15 Aug 2012 18:56:08 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:35818 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751283Ab2HOW4H (ORCPT ); Wed, 15 Aug 2012 18:56:07 -0400 Date: Wed, 15 Aug 2012 15:56:05 -0700 From: Andrew Morton To: Robin Holt Cc: linux-kernel@vger.kernel.org Subject: Re: [PATCH] SGI XPC fails to load when cpu 0 is out of IRQ resources. Message-Id: <20120815155605.cf02ef7b.akpm@linux-foundation.org> In-Reply-To: <20120803194628.GN3093@sgi.com> References: <20120803194628.GN3093@sgi.com> X-Mailer: Sylpheed 3.0.2 (GTK+ 2.20.1; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 3 Aug 2012 14:46:29 -0500 Robin Holt wrote: > On many of our larger systems, CPU 0 has had all of its IRQ resources > consumed before XPC loads. Worse cases on machines with multiple > 10 GigE cards and multiple IB cards have depleted the entire first > socket of IRQs. That patch makes selecting the node upon which > IRQs are allocated (as well as all the other GRU Message Queue > structures) specifiable as a module load param and has a default > behavior of searching all nodes/cpus for an available resource. > Is this problem serious enough to warrant a -stable backport? If you want it to appear in vendor kernels then I guess "yes". > +static int > +xpc_init_mq_node(int nid) > +{ > + int cpu; > + > + for_each_cpu(cpu, cpumask_of_node(nid)) { > + xpc_activate_mq_uv = xpc_create_gru_mq_uv(XPC_ACTIVATE_MQ_SIZE_UV, nid, > + XPC_ACTIVATE_IRQ_NAME, > + xpc_handle_activate_IRQ_uv); > + if (!IS_ERR(xpc_activate_mq_uv)) > + break; > + } > + if (IS_ERR(xpc_activate_mq_uv)) > + return PTR_ERR(xpc_activate_mq_uv); > + > + for_each_cpu(cpu, cpumask_of_node(nid)) { > + xpc_notify_mq_uv = xpc_create_gru_mq_uv(XPC_NOTIFY_MQ_SIZE_UV, nid, > + XPC_NOTIFY_IRQ_NAME, > + xpc_handle_notify_IRQ_uv); > + if (!IS_ERR(xpc_notify_mq_uv)) > + break; > + } > + if (IS_ERR(xpc_notify_mq_uv)) { > + xpc_destroy_gru_mq_uv(xpc_activate_mq_uv); > + return PTR_ERR(xpc_notify_mq_uv); > + } > + > + return 0; > +} This seems to take the optimistic approach to CPU hotplug ;) get_online_cpus(), perhaps?