* New NUMA scheduler and hotplug CPU @ 2004-01-25 23:50 Rusty Russell 2004-01-26 8:26 ` Nick Piggin 0 siblings, 1 reply; 18+ messages in thread From: Rusty Russell @ 2004-01-25 23:50 UTC (permalink / raw) To: piggin; +Cc: linux-kernel Hi Nick! Looking at your new scheduler in -mm, it uses cpu_online_map alot in arch_init_sched_domains. This means with hotplug CPU that it would need to be modified: certainly possible to do, but messy. The other option is to use cpu_possible_map to create the full topology up front, and then it need never change. AFAICT, no other changes are neccessary: you already check against moving tasks to offline cpus. Anyway, I was just porting the hotplug CPU patches over to -mm, and came across this, so I thought I'd ask. Thanks! Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: New NUMA scheduler and hotplug CPU 2004-01-25 23:50 New NUMA scheduler and hotplug CPU Rusty Russell @ 2004-01-26 8:26 ` Nick Piggin 2004-01-26 16:34 ` Martin J. Bligh 0 siblings, 1 reply; 18+ messages in thread From: Nick Piggin @ 2004-01-26 8:26 UTC (permalink / raw) To: Rusty Russell, Martin J. Bligh; +Cc: linux-kernel Rusty Russell wrote: >Hi Nick! > > Looking at your new scheduler in -mm, it uses cpu_online_map >alot in arch_init_sched_domains. This means with hotplug CPU that it >would need to be modified: certainly possible to do, but messy. > > The other option is to use cpu_possible_map to create the full >topology up front, and then it need never change. AFAICT, no other >changes are neccessary: you already check against moving tasks to >offline cpus. > >Anyway, I was just porting the hotplug CPU patches over to -mm, and >came across this, so I thought I'd ask. > Hi Rusty, Yes I'd like to use the cpu_possible_map to create the full topology straight up. Martin? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: New NUMA scheduler and hotplug CPU 2004-01-26 8:26 ` Nick Piggin @ 2004-01-26 16:34 ` Martin J. Bligh 2004-01-26 23:01 ` Nick Piggin 0 siblings, 1 reply; 18+ messages in thread From: Martin J. Bligh @ 2004-01-26 16:34 UTC (permalink / raw) To: Nick Piggin, Rusty Russell; +Cc: linux-kernel >> Looking at your new scheduler in -mm, it uses cpu_online_map >> alot in arch_init_sched_domains. This means with hotplug CPU that it >> would need to be modified: certainly possible to do, but messy. >> >> The other option is to use cpu_possible_map to create the full >> topology up front, and then it need never change. AFAICT, no other >> changes are neccessary: you already check against moving tasks to >> offline cpus. >> >> Anyway, I was just porting the hotplug CPU patches over to -mm, and >> came across this, so I thought I'd ask. >> > > Hi Rusty, > Yes I'd like to use the cpu_possible_map to create the full > topology straight up. Martin? Well isn't it a bad idea to have cpus in the data that are offline? It'll throw off all your balancing calculations, won't it? You seemed to be careful to do things like divide the total load on the node by the number of CPUs on the node, and that'll get totally borked if you have fake CPUs in there. To me, it'd make more sense to add the CPUs to the scheduler structures as they get brought online. I can also imagine machines where you have a massive (infinite?) variety of possible CPUs that could appear - like an NUMA box where you could just plug arbitrary numbers of new nodes in as you wanted. Moreover, as the CPUs aren't fixed numbers in advance, how are you going to know which node to put them in, etc? Setting up every possible thing in advance seems like an infeasible way to do hotplug to me. M. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: New NUMA scheduler and hotplug CPU 2004-01-26 16:34 ` Martin J. Bligh @ 2004-01-26 23:01 ` Nick Piggin 2004-01-26 23:24 ` Martin J. Bligh 2004-01-26 23:40 ` Andrew Theurer 0 siblings, 2 replies; 18+ messages in thread From: Nick Piggin @ 2004-01-26 23:01 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Rusty Russell, linux-kernel Martin J. Bligh wrote: >>> Looking at your new scheduler in -mm, it uses cpu_online_map >>>alot in arch_init_sched_domains. This means with hotplug CPU that it >>>would need to be modified: certainly possible to do, but messy. >>> >>> The other option is to use cpu_possible_map to create the full >>>topology up front, and then it need never change. AFAICT, no other >>>changes are neccessary: you already check against moving tasks to >>>offline cpus. >>> >>>Anyway, I was just porting the hotplug CPU patches over to -mm, and >>>came across this, so I thought I'd ask. >>> >>> >>Hi Rusty, >>Yes I'd like to use the cpu_possible_map to create the full >>topology straight up. Martin? >> > >Well isn't it a bad idea to have cpus in the data that are offline? >It'll throw off all your balancing calculations, won't it? You seemed >to be careful to do things like divide the total load on the node by >the number of CPUs on the node, and that'll get totally borked if you >have fake CPUs in there. > I think it mostly does a good job at making sure to only take online cpus into account. If there are places where it doesn't then it shouldn't be too hard to fix. > >To me, it'd make more sense to add the CPUs to the scheduler structures >as they get brought online. I can also imagine machines where you have >a massive (infinite?) variety of possible CPUs that could appear - >like an NUMA box where you could just plug arbitrary numbers of new >nodes in as you wanted. > I guess so, but you'd still need NR_CPUS to be >= that arbitrary number. > >Moreover, as the CPUs aren't fixed numbers in advance, how are you going >to know which node to put them in, etc? Setting up every possible thing >in advance seems like an infeasible way to do hotplug to me. > Well this would be the problem. I guess its quite possible that one doesn't know the topology of newly added CPUs before hand. Well OK, this would require a per architecture function to handle CPU hotplug. It could possibly just default to arch_init_sched_domains, and just completely reinitialise everything which would be the simplest. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: New NUMA scheduler and hotplug CPU 2004-01-26 23:01 ` Nick Piggin @ 2004-01-26 23:24 ` Martin J. Bligh 2004-01-26 23:40 ` Nick Piggin 2004-01-27 2:36 ` Rusty Russell 2004-01-26 23:40 ` Andrew Theurer 1 sibling, 2 replies; 18+ messages in thread From: Martin J. Bligh @ 2004-01-26 23:24 UTC (permalink / raw) To: Nick Piggin; +Cc: Rusty Russell, linux-kernel >> Well isn't it a bad idea to have cpus in the data that are offline? >> It'll throw off all your balancing calculations, won't it? You seemed >> to be careful to do things like divide the total load on the node by >> the number of CPUs on the node, and that'll get totally borked if you >> have fake CPUs in there. > > I think it mostly does a good job at making sure to only take > online cpus into account. If there are places where it doesn't > then it shouldn't be too hard to fix. It'd make the code a damned sight simpler and cleaner if you dropped all that stuff, and updated the structures when you hotplugged a CPU, which is really the only sensible way to do it anyway ... For instance, if I remove cpu X, then bring back a new CPU on another node (or in another HT sibling pair) as CPU X, then you'll need to update all that stuff anyway. CPUs aren't fixed position in that map - the ordering handed out is arbitrary. >> To me, it'd make more sense to add the CPUs to the scheduler structures >> as they get brought online. I can also imagine machines where you have >> a massive (infinite?) variety of possible CPUs that could appear - >> like an NUMA box where you could just plug arbitrary numbers of new >> nodes in as you wanted. > > I guess so, but you'd still need NR_CPUS to be >= that arbitrary > number. Yup ... but you don't have to enumerate all possible positions that way. See Linus' arguement re dynamic device numbers and ISCSI disks, etc. Same thing applies. > Well this would be the problem. I guess its quite possible that > one doesn't know the topology of newly added CPUs before hand. > > Well OK, this would require a per architecture function to handle > CPU hotplug. It could possibly just default to arch_init_sched_domains, > and just completely reinitialise everything which would be the simplest. Yeah, it's not trivially simple. But then neither is the rest of CPU hotplug, to do it right ;-) Requiring CPU hotplug callback hooks does seem to be the right way to interface with the sched code though ... M. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: New NUMA scheduler and hotplug CPU 2004-01-26 23:24 ` Martin J. Bligh @ 2004-01-26 23:40 ` Nick Piggin 2004-01-27 2:36 ` Rusty Russell 1 sibling, 0 replies; 18+ messages in thread From: Nick Piggin @ 2004-01-26 23:40 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Rusty Russell, linux-kernel Martin J. Bligh wrote: >>>Well isn't it a bad idea to have cpus in the data that are offline? >>>It'll throw off all your balancing calculations, won't it? You seemed >>>to be careful to do things like divide the total load on the node by >>>the number of CPUs on the node, and that'll get totally borked if you >>>have fake CPUs in there. >>> >>I think it mostly does a good job at making sure to only take >>online cpus into account. If there are places where it doesn't >>then it shouldn't be too hard to fix. >> > >It'd make the code a damned sight simpler and cleaner if you dropped >all that stuff, and updated the structures when you hotplugged a CPU, >which is really the only sensible way to do it anyway ... > >For instance, if I remove cpu X, then bring back a new CPU on another node >(or in another HT sibling pair) as CPU X, then you'll need to update all >that stuff anyway. CPUs aren't fixed position in that map - the ordering >handed out is arbitrary. > > >>>To me, it'd make more sense to add the CPUs to the scheduler structures >>>as they get brought online. I can also imagine machines where you have >>>a massive (infinite?) variety of possible CPUs that could appear - >>>like an NUMA box where you could just plug arbitrary numbers of new >>>nodes in as you wanted. >>> >>I guess so, but you'd still need NR_CPUS to be >= that arbitrary >>number. >> > >Yup ... but you don't have to enumerate all possible positions that way. >See Linus' arguement re dynamic device numbers and ISCSI disks, etc. >Same thing applies. > > >>Well this would be the problem. I guess its quite possible that >>one doesn't know the topology of newly added CPUs before hand. >> >>Well OK, this would require a per architecture function to handle >>CPU hotplug. It could possibly just default to arch_init_sched_domains, >>and just completely reinitialise everything which would be the simplest. >> > >Yeah, it's not trivially simple. But then neither is the rest of CPU >hotplug, to do it right ;-) Requiring CPU hotplug callback hooks does >seem to be the right way to interface with the sched code though ... > OK you've convinced me. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: New NUMA scheduler and hotplug CPU 2004-01-26 23:24 ` Martin J. Bligh 2004-01-26 23:40 ` Nick Piggin @ 2004-01-27 2:36 ` Rusty Russell 2004-01-27 4:38 ` Martin J. Bligh 1 sibling, 1 reply; 18+ messages in thread From: Rusty Russell @ 2004-01-27 2:36 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Nick Piggin, linux-kernel In message <31860000.1075159471@flay> you write: > > I think it mostly does a good job at making sure to only take > > online cpus into account. If there are places where it doesn't > > then it shouldn't be too hard to fix. > > It'd make the code a damned sight simpler and cleaner if you dropped > all that stuff, and updated the structures when you hotplugged a CPU, > which is really the only sensible way to do it anyway ... No, actually, it wouldn't. Take it from someone who has actually looked at the code with an eye to doing this. Replacing static structures by dynamic ones for an architecture which doesn't yet exist is NOT a good idea. > For instance, if I remove cpu X, then bring back a new CPU on another node > (or in another HT sibling pair) as CPU X, then you'll need to update all > that stuff anyway. CPUs aren't fixed position in that map - the ordering > handed out is arbitrary. Sure, if they were stupid they'd do it this way. If (when) an architecture has hotpluggable CPUs and NUMA characteristics, they probably will have fixed CPU *slots*, and number CPUs based on what slot they are in. Since the slots don't move, all your fancy dynamic logic will be wasted. When someone really has dynamic hotplug CPU capability with variable attributes, *they* can code up the dynamic hierarchy. Because *they* can actually test it! > > I guess so, but you'd still need NR_CPUS to be >= that arbitrary > > number. > > Yup ... but you don't have to enumerate all possible positions that way. > See Linus' arguement re dynamic device numbers and ISCSI disks, etc. > Same thing applies. Crap. When all the fixed per-cpu arrays have been removed from the kernel, come back and talk about instantiation and location of arbitrary CPUS. You're way overdesigning: have you been sharing food with the AIX guys? Cheers! Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: New NUMA scheduler and hotplug CPU 2004-01-27 2:36 ` Rusty Russell @ 2004-01-27 4:38 ` Martin J. Bligh 2004-01-27 5:39 ` Nick Piggin 0 siblings, 1 reply; 18+ messages in thread From: Martin J. Bligh @ 2004-01-27 4:38 UTC (permalink / raw) To: Rusty Russell; +Cc: Nick Piggin, linux-kernel > No, actually, it wouldn't. Take it from someone who has actually > looked at the code with an eye to doing this. > > Replacing static structures by dynamic ones for an architecture which > doesn't yet exist is NOT a good idea. Trying to force a dynamic infrastructure into the static bitmap arrays that we have is the bad idea, IMHO. Why on earth would you want offline CPUs in the scheduler domains? Just to make your coding easier? Sorry, but that just doesn't cut it for me. > Sure, if they were stupid they'd do it this way. > > If (when) an architecture has hotpluggable CPUs and NUMA > characteristics, they probably will have fixed CPU *slots*, and number > CPUs based on what slot they are in. Since the slots don't move, all > your fancy dynamic logic will be wasted. > > When someone really has dynamic hotplug CPU capability with variable > attributes, *they* can code up the dynamic hierarchy. Because *they* > can actually test it! The cpu numbers are now dynamically allocated tags. I don't see why we should sacrifice that just to get cpu hotplug. Sure, it makes your coding a little harder, but .... >> Yup ... but you don't have to enumerate all possible positions that way. >> See Linus' arguement re dynamic device numbers and ISCSI disks, etc. >> Same thing applies. > > Crap. When all the fixed per-cpu arrays have been removed from the > kernel, come back and talk about instantiation and location of > arbitrary CPUS. > > You're way overdesigning: have you been sharing food with the AIX guys? A cheap shot. Please, I'd expect better flaming from you. Sorry if this makes your coding harder, but it seems clear to me that it's the right way to go. I guess the final decision is up to Andrew, but I really don't want to see this kind of stuff. You don't start kthreads for every possible cpu, do you? M. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: New NUMA scheduler and hotplug CPU 2004-01-27 4:38 ` Martin J. Bligh @ 2004-01-27 5:39 ` Nick Piggin 2004-01-27 7:19 ` Martin J. Bligh 0 siblings, 1 reply; 18+ messages in thread From: Nick Piggin @ 2004-01-27 5:39 UTC (permalink / raw) To: Martin J. Bligh, Rusty Russell; +Cc: linux-kernel Martin J. Bligh wrote: >>No, actually, it wouldn't. Take it from someone who has actually >>looked at the code with an eye to doing this. >> >>Replacing static structures by dynamic ones for an architecture which >>doesn't yet exist is NOT a good idea. >> > >Trying to force a dynamic infrastructure into the static bitmap arrays >that we have is the bad idea, IMHO. Why on earth would you want offline >CPUs in the scheduler domains? Just to make your coding easier? Sorry, >but that just doesn't cut it for me. > > >>Sure, if they were stupid they'd do it this way. >> >>If (when) an architecture has hotpluggable CPUs and NUMA >>characteristics, they probably will have fixed CPU *slots*, and number >>CPUs based on what slot they are in. Since the slots don't move, all >>your fancy dynamic logic will be wasted. >> >>When someone really has dynamic hotplug CPU capability with variable >>attributes, *they* can code up the dynamic hierarchy. Because *they* >>can actually test it! >> > >The cpu numbers are now dynamically allocated tags. I don't see why >we should sacrifice that just to get cpu hotplug. Sure, it makes your >coding a little harder, but .... > > >>>Yup ... but you don't have to enumerate all possible positions that way. >>>See Linus' arguement re dynamic device numbers and ISCSI disks, etc. >>>Same thing applies. >>> >>Crap. When all the fixed per-cpu arrays have been removed from the >>kernel, come back and talk about instantiation and location of >>arbitrary CPUS. >> >>You're way overdesigning: have you been sharing food with the AIX guys? >> > >A cheap shot. Please, I'd expect better flaming from you. > >Sorry if this makes your coding harder, but it seems clear to me that >it's the right way to go. I guess the final decision is up to Andrew, >but I really don't want to see this kind of stuff. You don't start >kthreads for every possible cpu, do you? > > Well lets not worry too much about this for now. We could use static arrays and cpu_possible for now until we get a feel for what specific architectures want. To be honest I haven't seen the hotplug CPU code and I don't know about what architectures want to be doing with it, so this is my preferred direction just out of ignorance. An easy next step toward a dynamic scheme would be just to re-init the entire sched domain topology (the generic init uses the generic NUMA topology info which will have to be handled by these architectures anyway). Modulo a small locking problem. There aren't any fundamental design issues (with sched domains) that I can see preventing a more dynamic system so we can keep that in mind. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: New NUMA scheduler and hotplug CPU 2004-01-27 5:39 ` Nick Piggin @ 2004-01-27 7:19 ` Martin J. Bligh 2004-01-27 15:27 ` Martin J. Bligh 0 siblings, 1 reply; 18+ messages in thread From: Martin J. Bligh @ 2004-01-27 7:19 UTC (permalink / raw) To: Nick Piggin, Rusty Russell; +Cc: linux-kernel > Well lets not worry too much about this for now. We could use > static arrays and cpu_possible for now until we get a feel > for what specific architectures want. > > To be honest I haven't seen the hotplug CPU code and I don't > know about what architectures want to be doing with it, so > this is my preferred direction just out of ignorance. > > An easy next step toward a dynamic scheme would be just to > re-init the entire sched domain topology (the generic init uses > the generic NUMA topology info which will have to be handled > by these architectures anyway). Modulo a small locking problem. > > There aren't any fundamental design issues (with sched domains) > that I can see preventing a more dynamic system so we can keep > that in mind. Yeah, I talked it over with Rusty some on IRC. I have more of a feeling why he's trying to do it that way now. However, one other thought occurs to me ... it'd be good to use the same infrastructure (sched domains) for the workload management stuff as well (where the domains would be defined from userspace). That'd also necessitate them being dynamic, if you think that'd work out as a usage model. The cpu_possible stuff might work for a first cut at hotplug I guess. I still think it's ugly though ;-) M. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: New NUMA scheduler and hotplug CPU 2004-01-27 7:19 ` Martin J. Bligh @ 2004-01-27 15:27 ` Martin J. Bligh 2004-01-28 0:23 ` Rusty Russell 0 siblings, 1 reply; 18+ messages in thread From: Martin J. Bligh @ 2004-01-27 15:27 UTC (permalink / raw) To: Nick Piggin, Rusty Russell; +Cc: linux-kernel > Yeah, I talked it over with Rusty some on IRC. I have more of a feeling > why he's trying to do it that way now. BTW, Rusty - what are the locking rules for cpu_online_map under hotplug? Is it RCU or something? The sched domains usage of it doesn't seem to take any locks. M. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: New NUMA scheduler and hotplug CPU 2004-01-27 15:27 ` Martin J. Bligh @ 2004-01-28 0:23 ` Rusty Russell 0 siblings, 0 replies; 18+ messages in thread From: Rusty Russell @ 2004-01-28 0:23 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Nick Piggin, linux-kernel In message <368660000.1075217230@[10.10.2.4]> you write: > > Yeah, I talked it over with Rusty some on IRC. I have more of a feeling > > why he's trying to do it that way now. > > BTW, Rusty - what are the locking rules for cpu_online_map under hotplug? > Is it RCU or something? The sched domains usage of it doesn't seem to take > any locks. The trivial usage is to take the cpucontrol sem (down_cpucontrol()). There's a grace period between taking the cpu offline and actually killing it too, so for most usages RCU is sufficient. Fortunately, I've yet to hit a case where this isn't sufficient. For the scheduler there's an explicit "move all tasks off the CPU" call which takes the tasklist lock and walks the tasks. Cheers, Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: New NUMA scheduler and hotplug CPU 2004-01-26 23:01 ` Nick Piggin 2004-01-26 23:24 ` Martin J. Bligh @ 2004-01-26 23:40 ` Andrew Theurer 2004-01-27 0:07 ` Nick Piggin 2004-01-27 0:09 ` Martin J. Bligh 1 sibling, 2 replies; 18+ messages in thread From: Andrew Theurer @ 2004-01-26 23:40 UTC (permalink / raw) To: Nick Piggin, Martin J. Bligh; +Cc: Rusty Russell, linux-kernel > >To me, it'd make more sense to add the CPUs to the scheduler structures > >as they get brought online. I can also imagine machines where you have > >a massive (infinite?) variety of possible CPUs that could appear - > >like an NUMA box where you could just plug arbitrary numbers of new > >nodes in as you wanted. > > I guess so, but you'd still need NR_CPUS to be >= that arbitrary > number. > > >Moreover, as the CPUs aren't fixed numbers in advance, how are you going > >to know which node to put them in, etc? Setting up every possible thing > >in advance seems like an infeasible way to do hotplug to me. > > Well this would be the problem. I guess its quite possible that > one doesn't know the topology of newly added CPUs before hand. > > Well OK, this would require a per architecture function to handle > CPU hotplug. It could possibly just default to arch_init_sched_domains, > and just completely reinitialise everything which would be the simplest. Call me crazy, but why not let the topology be determined via userspace at a more appropriate time? When you hotplug, you tell it where in the scheduler to plug it. Have structures in the scheduler which represent the nodes-runqueues-cpus topology (in the past I tried a node/rq/cpu structs with simple pointers), but let the topology be built based on user's desires thru hotplug. For example, you boot on just the boot cpu, which by default is in the first node on the first runqueue. All other cpus, whether being "booted" for the for the first time or hotplugged (maybe now there's really no difference), the hotplugging tells where the cpu should be, in what node and what runqueue. HT cpus work even better, because you can hotplug siblings, once at a time if you wanted, to the same runqueue. Or you have cpus sharing a die, same thing, lots of choices here. This removes any per-arch updates to the kernel for things like scheduler topology, and lets them go somewhere else more easily changes, like userspace. Forgive me if this sounds stupid; I have not been following the discussion closely. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: New NUMA scheduler and hotplug CPU 2004-01-26 23:40 ` Andrew Theurer @ 2004-01-27 0:07 ` Nick Piggin 2004-01-27 2:21 ` Andrew Theurer 2004-01-27 0:09 ` Martin J. Bligh 1 sibling, 1 reply; 18+ messages in thread From: Nick Piggin @ 2004-01-27 0:07 UTC (permalink / raw) To: Andrew Theurer; +Cc: Martin J. Bligh, Rusty Russell, linux-kernel Andrew Theurer wrote: >>>To me, it'd make more sense to add the CPUs to the scheduler structures >>>as they get brought online. I can also imagine machines where you have >>>a massive (infinite?) variety of possible CPUs that could appear - >>>like an NUMA box where you could just plug arbitrary numbers of new >>>nodes in as you wanted. >>> >>I guess so, but you'd still need NR_CPUS to be >= that arbitrary >>number. >> >> >>>Moreover, as the CPUs aren't fixed numbers in advance, how are you going >>>to know which node to put them in, etc? Setting up every possible thing >>>in advance seems like an infeasible way to do hotplug to me. >>> >>Well this would be the problem. I guess its quite possible that >>one doesn't know the topology of newly added CPUs before hand. >> >>Well OK, this would require a per architecture function to handle >>CPU hotplug. It could possibly just default to arch_init_sched_domains, >>and just completely reinitialise everything which would be the simplest. >> > >Call me crazy, but why not let the topology be determined via userspace at a >more appropriate time? When you hotplug, you tell it where in the scheduler >to plug it. Have structures in the scheduler which represent the >nodes-runqueues-cpus topology (in the past I tried a node/rq/cpu structs with >simple pointers), but let the topology be built based on user's desires thru >hotplug. > Well isn't userspace's idea of topology just what the kernel tells it? I'm not sure what it would buy you... but I guess it wouldn't be too much harder than doing it in kernel, just a matter of making the userspace API. BTW. I guess you haven't seen my sched domains code. It can describe arbitrary topologies. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: New NUMA scheduler and hotplug CPU 2004-01-27 0:07 ` Nick Piggin @ 2004-01-27 2:21 ` Andrew Theurer 2004-01-27 2:40 ` Nick Piggin 0 siblings, 1 reply; 18+ messages in thread From: Andrew Theurer @ 2004-01-27 2:21 UTC (permalink / raw) To: Nick Piggin; +Cc: Martin J. Bligh, Rusty Russell, linux-kernel On Monday 26 January 2004 18:07, Nick Piggin wrote: > >>Well OK, this would require a per architecture function to handle > >>CPU hotplug. It could possibly just default to arch_init_sched_domains, > >>and just completely reinitialise everything which would be the simplest. > > > >Call me crazy, but why not let the topology be determined via userspace at > > a more appropriate time? When you hotplug, you tell it where in the > > scheduler to plug it. Have structures in the scheduler which represent > > the nodes-runqueues-cpus topology (in the past I tried a node/rq/cpu > > structs with simple pointers), but let the topology be built based on > > user's desires thru hotplug. > > Well isn't userspace's idea of topology just what the kernel tells it? > I'm not sure what it would buy you... but I guess it wouldn't be too > much harder than doing it in kernel, just a matter of making the userspace > API. Sort of, the cpus to node is pretty much what the kernel says it is, but the cpu to runqueue mapping IMO is not a clear cut thing. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: New NUMA scheduler and hotplug CPU 2004-01-27 2:21 ` Andrew Theurer @ 2004-01-27 2:40 ` Nick Piggin 0 siblings, 0 replies; 18+ messages in thread From: Nick Piggin @ 2004-01-27 2:40 UTC (permalink / raw) To: habanero; +Cc: Martin J. Bligh, Rusty Russell, linux-kernel Andrew Theurer wrote: >On Monday 26 January 2004 18:07, Nick Piggin wrote: > >>>>Well OK, this would require a per architecture function to handle >>>>CPU hotplug. It could possibly just default to arch_init_sched_domains, >>>>and just completely reinitialise everything which would be the simplest. >>>> >>>Call me crazy, but why not let the topology be determined via userspace at >>>a more appropriate time? When you hotplug, you tell it where in the >>>scheduler to plug it. Have structures in the scheduler which represent >>>the nodes-runqueues-cpus topology (in the past I tried a node/rq/cpu >>>structs with simple pointers), but let the topology be built based on >>>user's desires thru hotplug. >>> >>Well isn't userspace's idea of topology just what the kernel tells it? >>I'm not sure what it would buy you... but I guess it wouldn't be too >>much harder than doing it in kernel, just a matter of making the userspace >>API. >> > >Sort of, the cpus to node is pretty much what the kernel says it is, but the >cpu to runqueue mapping IMO is not a clear cut thing. > > But userspace still can't know more than the kernel tells it. Apart from that, the SMT stuff in the sched domains patch means SMT CPUs need not share runqueues. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: New NUMA scheduler and hotplug CPU 2004-01-26 23:40 ` Andrew Theurer 2004-01-27 0:07 ` Nick Piggin @ 2004-01-27 0:09 ` Martin J. Bligh 2004-01-27 2:19 ` Andrew Theurer 1 sibling, 1 reply; 18+ messages in thread From: Martin J. Bligh @ 2004-01-27 0:09 UTC (permalink / raw) To: Andrew Theurer, Nick Piggin; +Cc: Rusty Russell, linux-kernel > Call me crazy, but why not let the topology be determined via userspace at a > more appropriate time? When you hotplug, you tell it where in the scheduler > to plug it. Have structures in the scheduler which represent the > nodes-runqueues-cpus topology (in the past I tried a node/rq/cpu structs with > simple pointers), but let the topology be built based on user's desires thru > hotplug. Well, I agree with the "at a more appropriate time" bit. But there's no real need to make a bunch of complicated stuff out in userspace for this - we're trying to lay out the scheduler domains according to the hardware topology of the machine. It's not a userspace namespace or anything. Having userspace fishing down way deep in hardware specific stuff is silly - the kernel is there as a hardware abstraction layer. Now if you wanted to use sched domains for workload management or something and involve userspace, then yes ... that'd be more appropriate. > For example, you boot on just the boot cpu, which by default is in the first > node on the first runqueue. All other cpus, whether being "booted" for the > for the first time or hotplugged (maybe now there's really no difference), > the hotplugging tells where the cpu should be, in what node and what > runqueue. HT cpus work even better, because you can hotplug siblings, once > at a time if you wanted, to the same runqueue. Or you have cpus sharing a > die, same thing, lots of choices here. This removes any per-arch updates to > the kernel for things like scheduler topology, and lets them go somewhere > else more easily changes, like userspace. Ummm ... but *none* of that is dictated as policy stuff - it's all just the hardware layout of the machine. You cannot "decide" as the sysadmin which node a CPU is in, or which HT sibling it has. It's just there ;-) The only thing you could possibly dictate is the CPU number you want assigned to the new CPU, which frankly, I think is pointless - they're arbitrary tags, and always have been. M. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: New NUMA scheduler and hotplug CPU 2004-01-27 0:09 ` Martin J. Bligh @ 2004-01-27 2:19 ` Andrew Theurer 0 siblings, 0 replies; 18+ messages in thread From: Andrew Theurer @ 2004-01-27 2:19 UTC (permalink / raw) To: Martin J. Bligh, Nick Piggin; +Cc: Rusty Russell, linux-kernel On Monday 26 January 2004 18:09, Martin J. Bligh wrote: > > For example, you boot on just the boot cpu, which by default is in the > > first node on the first runqueue. All other cpus, whether being "booted" > > for the for the first time or hotplugged (maybe now there's really no > > difference), the hotplugging tells where the cpu should be, in what node > > and what runqueue. HT cpus work even better, because you can hotplug > > siblings, once at a time if you wanted, to the same runqueue. Or you > > have cpus sharing a die, same thing, lots of choices here. This removes > > any per-arch updates to the kernel for things like scheduler topology, > > and lets them go somewhere else more easily changes, like userspace. > > Ummm ... but *none* of that is dictated as policy stuff - it's all just > the hardware layout of the machine. You cannot "decide" as the sysadmin > which node a CPU is in, or which HT sibling it has. It's just there ;-) > The only thing you could possibly dictate is the CPU number you want > assigned to the new CPU, which frankly, I think is pointless - they're > arbitrary tags, and always have been. How many cpus share a runqueue IMO could be a policy thing. Some HT cpus may be better sharing a runqueue where others (lots and lots of siblings in one core) may not. ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2004-01-28 0:59 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-01-25 23:50 New NUMA scheduler and hotplug CPU Rusty Russell 2004-01-26 8:26 ` Nick Piggin 2004-01-26 16:34 ` Martin J. Bligh 2004-01-26 23:01 ` Nick Piggin 2004-01-26 23:24 ` Martin J. Bligh 2004-01-26 23:40 ` Nick Piggin 2004-01-27 2:36 ` Rusty Russell 2004-01-27 4:38 ` Martin J. Bligh 2004-01-27 5:39 ` Nick Piggin 2004-01-27 7:19 ` Martin J. Bligh 2004-01-27 15:27 ` Martin J. Bligh 2004-01-28 0:23 ` Rusty Russell 2004-01-26 23:40 ` Andrew Theurer 2004-01-27 0:07 ` Nick Piggin 2004-01-27 2:21 ` Andrew Theurer 2004-01-27 2:40 ` Nick Piggin 2004-01-27 0:09 ` Martin J. Bligh 2004-01-27 2:19 ` Andrew Theurer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox