From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paul Jackson Date: Fri, 23 Sep 2005 07:00:16 +0000 Subject: Using Cpusets with HyperThreads Message-Id: <20050923000016.2cc416ac.pj@sgi.com> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-ia64@vger.kernel.org This note explains the support provided by cpusets for job placement on hyperthreaded CPUs in upcoming products, enabling one to control what can run on the A and B sides of each core, if anything. How does this look? Is the document itself clear and complete? Will the following serve your needs? What's missing, wrong-headed or useless? Is there a better way we should consider? The cpuset command and library technology currently shipping in the latest ProPack 4 versions already includes the following technology, so it is quite unlikely that we would remove any of this. But there may well be additional features, and improved documentation, that would be useful. Your feedback is welcome. Using Cpusets with HyperThreads =============== In addition to their traditional use to control the placement of jobs on the CPUs and Memory Nodes of a system, cpusets also provides a convenient mechanism to control the use of hyperthreading (HT). Some jobs achieve better performance using both of the hyperthread sides, A and B, of a processor core, and some run better using just one of the sides, letting the other side idle. Since each logical (hyperthreaded) processor in a core has a distinct CPU number, one can easily specify a cpuset that contains both, or contains just one side, from each of the processor cores in the cpuset. Cpusets can be configured to include any combination of the logical CPUs in a system. For example the cpuset configuration file: cpus 0-127:2 # the even numbered CPUs 0, 2, 4, ... 126 mems 0-63 # all memory nodes 0, 1, 2, ... 63 would include the A sides of an HT enabled system, along with all the memory, on the first 64 nodes. The colon ':' prefixes the stride. The stride of '2' in this example means use every other logical CPU. The following commands would create a cpuset 'foo' according to the above example, and run the job 'bar' in that cpuset, given that 'cpuset.cfg' is a file containing the above 2 example lines: cpuset -c /foo < cpuset.cfg # create '/foo' on A sides cpuset -i /foo -I bar # run 'bar' in cpuset /foo To specify both sides of the first 64 cores, use: cpus 0-127 To specify just the B sides, use: cpus 1-127:2 The above assumes that CPUs are uniformly numbered, with the even numbers for the A side and odd numbers for the B side. This is usually the case, but not guaranteed. One could still place a job on a system that was not uniformly numbered, but currently it would involve a longer argument list to the 'cpus' option, explicitly listing the desired CPUs. When time permits, we can add more options to the cpuset command and libcpuset C interfaces, to make it convenient to manage hyperthread placement on non-uniformly numbered systems. We do not need to create a separate cpuset with just the B side CPUs to avoid having something run there. Tasks can only run where there are cpusets allowing it. If there is no cpuset for the B sides except the all encompassing root cpuset, and if only root can put tasks in that cpuset, then no one other than root can run on the B sides. The dplace command can be used to manage more detailed placement of job tasks within a such a cpuset. Since dplace numbering of CPUs is relative to the cpuset, it does not affect the dplace configuration whether the cpuset includes both sides of hyperthreaded cores, or just one side, or even is on a system that does not support hyperthreading. Typically, the logical numbering of CPUs puts the even numbered CPUs on the A sides, and the odd numbered CPUs on the B side. The stride suffix (":2", above) makes it easy to specify that only every other side will be used. If the CPU number range starts with an even number, this will be the A sides, and if the range starts with an odd number, this will be the B sides. Use the following steps, for example, to setup a job to run only on the A sides of its hyperthreaded cores, and to ensure that nothing runs on the B sides (they remain idle). 1. The whole system is covered by a root cpuset (always the case). 2. A boot cpuset is defined to keep the kernel, system daemon and user login session threads off other cpus. 3. The sys admin or batch scheduler with root permission creates a cpuset that includes on the A sides of the processors to be used for this job. 4. The sys admin or batch scheduler does not create any cpuset with the B side CPUs in these processors. Then nothing disruptive ever runs on the corresponding B side CPUs. This is different from cpusets on IRIX. On IRIX, not all CPUs were necessarily included in cpusets, and not all jobs were placed in cpusets. Jobs not in a cpuset could run without constraint on the CPUs not in cpusets. So, on IRIX, one would have to also create a cpuset for the B sides, to ensure that other jobs did not run there. The cpuset model for Linux 2.6 kernels is different. If a site uses a bootcpuset to confine the traditional Unix load, then nothing will run on the other CPUs in the system, except when those CPUs are included in a cpuset that has a job assigned to it. These CPUs are of course in the root cpuset, but this cpuset would normally only be usable by a system administrator or batch scheduler with root permissions. This prevents anyone without root permission from running a task on those CPUs, unless an administrator or service with root permission allows it. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson 1.925.600.0401