* HP ProLiant DL360p Gen8 hangs with Linux 4.13+. @ 2018-01-04 22:32 Vinson Lee 2018-01-05 16:32 ` Bart Van Assche 2018-01-14 23:40 ` Laurence Oberman 0 siblings, 2 replies; 10+ messages in thread From: Vinson Lee @ 2018-01-04 22:32 UTC (permalink / raw) To: linux-scsi, Don Brace Hi. HP ProLiant DL360p Gen8 with Smart Array P420i boots to the login prompt and hangs with Linux 4.13 or later. I cannot log in on console or SSH into the machine. Linux 4.12 and older boot fine. I see these messages on the console. [ 242.843206] INFO: task scsi_eh_2:465 blocked for more than 120 seconds. [ 242.877835] Not tainted 4.15.0-041500rc6-generic #201712312330 [ 242.909228] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 242.945404] INFO: task xfsaild/sda2:625 blocked for more than 120 seconds. [ 242.945407] Not tainted 4.15.0-041500rc6-generic #201712312330 [ 242.945410] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 242.945896] INFO: task kworker/u130:4:1023 blocked for more than 120 seconds. [ 242.945897] Not tainted 4.15.0-041500rc6-generic #201712312330 [ 242.945897] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 242.946449] INFO: task modprobe:1550 blocked for more than 120 seconds. [ 242.946450] Not tainted 4.15.0-041500rc6-generic #201712312330 [ 242.946450] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 242.946943] INFO: task postfix:1704 blocked for more than 120 seconds. [ 242.946946] Not tainted 4.15.0-041500rc6-generic #201712312330 [ 242.946948] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 242.947429] INFO: task (xinit.sh):1989 blocked for more than 120 seconds. [ 242.947432] Not tainted 4.15.0-041500rc6-generic #201712312330 [ 242.947434] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.674387] INFO: task scsi_eh_2:465 blocked for more than 120 seconds. [ 363.707741] Not tainted 4.15.0-041500rc6-generic #201712312330 [ 363.738601] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.774098] INFO: task xfsaild/sda2:625 blocked for more than 120 seconds. [ 363.804996] Not tainted 4.15.0-041500rc6-generic #201712312330 [ 363.833565] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.869380] INFO: task kworker/u130:4:1023 blocked for more than 120 seconds. [ 363.901795] Not tainted 4.15.0-041500rc6-generic #201712312330 [ 363.930403] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.966228] INFO: task modprobe:1550 blocked for more than 120 seconds. [ 363.966231] Not tainted 4.15.0-041500rc6-generic #201712312330 [ 363.966233] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Cheers, Vinson ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: HP ProLiant DL360p Gen8 hangs with Linux 4.13+. 2018-01-04 22:32 HP ProLiant DL360p Gen8 hangs with Linux 4.13+ Vinson Lee @ 2018-01-05 16:32 ` Bart Van Assche 2018-01-06 20:45 ` Laurence Oberman 2018-01-11 0:52 ` Vinson Lee 2018-01-14 23:40 ` Laurence Oberman 1 sibling, 2 replies; 10+ messages in thread From: Bart Van Assche @ 2018-01-05 16:32 UTC (permalink / raw) To: linux-scsi@vger.kernel.org, don.brace@microsemi.com, vlee@freedesktop.org On Thu, 2018-01-04 at 14:32 -0800, Vinson Lee wrote: > HP ProLiant DL360p Gen8 with Smart Array P420i boots to the login > prompt and hangs with Linux 4.13 or later. I cannot log in on console > or SSH into the machine. Linux 4.12 and older boot fine. > > I see these messages on the console. > > [ 242.843206] INFO: task scsi_eh_2:465 blocked for more than 120 seconds. > [ 242.877835] Not tainted 4.15.0-041500rc6-generic #201712312330 It seems like something got stuck in the block layer. The traditional way to debug this is to analyze the information that is available under /sys/kernel/debug/block. However, since login is not possible we can't use that approach. Would it be possible for you to check whether this has been resolved in kernel v4.15-rc6, and if not, bisect this? Thanks, Bart. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: HP ProLiant DL360p Gen8 hangs with Linux 4.13+. 2018-01-05 16:32 ` Bart Van Assche @ 2018-01-06 20:45 ` Laurence Oberman 2018-01-11 0:52 ` Vinson Lee 1 sibling, 0 replies; 10+ messages in thread From: Laurence Oberman @ 2018-01-06 20:45 UTC (permalink / raw) To: Bart Van Assche, linux-scsi@vger.kernel.org, don.brace@microsemi.com, vlee@freedesktop.org On Fri, 2018-01-05 at 16:32 +0000, Bart Van Assche wrote: > On Thu, 2018-01-04 at 14:32 -0800, Vinson Lee wrote: > > HP ProLiant DL360p Gen8 with Smart Array P420i boots to the login > > prompt and hangs with Linux 4.13 or later. I cannot log in on > > console > > or SSH into the machine. Linux 4.12 and older boot fine. > > > > I see these messages on the console. > > > > [ 242.843206] INFO: task scsi_eh_2:465 blocked for more than 120 > > seconds. > > [ 242.877835] Not tainted 4.15.0-041500rc6-generic > > #201712312330 > > It seems like something got stuck in the block layer. The traditional > way to > debug this is to analyze the information that is available under > /sys/kernel/debug/block. However, since login is not possible we > can't use > that approach. Would it be possible for you to check whether this has > been > resolved in kernel v4.15-rc6, and if not, bisect this? > > Thanks, > > Bart. One of the ways to debug this given its an HP DL380 is follow this. 1. Boot the working kernel 2, ensure kdump is activated and running on boot. 3. add these to the /etc/sysctl.conf file kernel.panic_on_io_nmi = 1 kernel.panic_on_unrecovered_nmi = 1 kernel.unknown_nmi_panic = 1 4, Once hung after boot, go to the ILO page under admin/diagnostics and press the Virtual NMI button to generate a vmcore When you have a vmcore, I will give you a place to upload it to so I can look at it ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: HP ProLiant DL360p Gen8 hangs with Linux 4.13+. 2018-01-05 16:32 ` Bart Van Assche 2018-01-06 20:45 ` Laurence Oberman @ 2018-01-11 0:52 ` Vinson Lee 2018-01-17 0:17 ` Vinson Lee 1 sibling, 1 reply; 10+ messages in thread From: Vinson Lee @ 2018-01-11 0:52 UTC (permalink / raw) To: Bart Van Assche, Thomas Gleixner, Christoph Hellwig Cc: linux-scsi@vger.kernel.org, don.brace@microsemi.com On Fri, Jan 5, 2018 at 8:32 AM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > On Thu, 2018-01-04 at 14:32 -0800, Vinson Lee wrote: >> HP ProLiant DL360p Gen8 with Smart Array P420i boots to the login >> prompt and hangs with Linux 4.13 or later. I cannot log in on console >> or SSH into the machine. Linux 4.12 and older boot fine. >> >> I see these messages on the console. >> >> [ 242.843206] INFO: task scsi_eh_2:465 blocked for more than 120 seconds. >> [ 242.877835] Not tainted 4.15.0-041500rc6-generic #201712312330 > > It seems like something got stuck in the block layer. The traditional way to > debug this is to analyze the information that is available under > /sys/kernel/debug/block. However, since login is not possible we can't use > that approach. Would it be possible for you to check whether this has been > resolved in kernel v4.15-rc6, and if not, bisect this? > > Thanks, > > Bart. Hi. The machine still hangs with Linux 4.15-rc6. I did a bisect. The hang is introduced with Linux 4.13-rc1 commit c5cb83bb337c25caae995d992d1cdf9b317f83de "genirq/cpuhotplug: Handle managed IRQs on CPU hotplug". There is a startup script that disables hyperthreading by offlining sibling CPUs. for CPU in $(cut -s -d, -f2 $SYS_PATH/cpu*/topology/thread_siblings_list | sort -un); do echo 0 > /sys/devices/system/cpu/cpu$CPU/online done If the above script is not run, the machine does not hang with Linux 4.13. Cheers, Vinson ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: HP ProLiant DL360p Gen8 hangs with Linux 4.13+. 2018-01-11 0:52 ` Vinson Lee @ 2018-01-17 0:17 ` Vinson Lee 0 siblings, 0 replies; 10+ messages in thread From: Vinson Lee @ 2018-01-17 0:17 UTC (permalink / raw) To: Bart Van Assche, Thomas Gleixner, Christoph Hellwig Cc: linux-scsi@vger.kernel.org, don.brace@microsemi.com On Wed, Jan 10, 2018 at 4:52 PM, Vinson Lee <vlee@freedesktop.org> wrote: > On Fri, Jan 5, 2018 at 8:32 AM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: >> On Thu, 2018-01-04 at 14:32 -0800, Vinson Lee wrote: >>> HP ProLiant DL360p Gen8 with Smart Array P420i boots to the login >>> prompt and hangs with Linux 4.13 or later. I cannot log in on console >>> or SSH into the machine. Linux 4.12 and older boot fine. >>> >>> I see these messages on the console. >>> >>> [ 242.843206] INFO: task scsi_eh_2:465 blocked for more than 120 seconds. >>> [ 242.877835] Not tainted 4.15.0-041500rc6-generic #201712312330 >> >> It seems like something got stuck in the block layer. The traditional way to >> debug this is to analyze the information that is available under >> /sys/kernel/debug/block. However, since login is not possible we can't use >> that approach. Would it be possible for you to check whether this has been >> resolved in kernel v4.15-rc6, and if not, bisect this? >> >> Thanks, >> >> Bart. > > Hi. > > The machine still hangs with Linux 4.15-rc6. > > I did a bisect. The hang is introduced with Linux 4.13-rc1 commit > c5cb83bb337c25caae995d992d1cdf9b317f83de "genirq/cpuhotplug: Handle > managed IRQs on CPU hotplug". > > There is a startup script that disables hyperthreading by offlining > sibling CPUs. > > for CPU in $(cut -s -d, -f2 > $SYS_PATH/cpu*/topology/thread_siblings_list | sort -un); do > echo 0 > /sys/devices/system/cpu/cpu$CPU/online > done > > If the above script is not run, the machine does not hang with Linux 4.13. > > Cheers, > Vinson Hi. HP ProLiant DL360p Gen8 still hangs with Linux 4.15-rc8. I see machine hangs now too with another machine with Microsemi Adaptec RAID 71605 and aacraid driver on both Linux 4.13 and Linux 4.15-rc8. Cheers, Vinson ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: HP ProLiant DL360p Gen8 hangs with Linux 4.13+. 2018-01-04 22:32 HP ProLiant DL360p Gen8 hangs with Linux 4.13+ Vinson Lee 2018-01-05 16:32 ` Bart Van Assche @ 2018-01-14 23:40 ` Laurence Oberman 2018-01-15 12:17 ` Ming Lei 1 sibling, 1 reply; 10+ messages in thread From: Laurence Oberman @ 2018-01-14 23:40 UTC (permalink / raw) To: Vinson Lee, linux-scsi, Don Brace; +Cc: Hellwig, Christoph, Jens Axboe On Thu, 2018-01-04 at 14:32 -0800, Vinson Lee wrote: > Hi. > > HP ProLiant DL360p Gen8 with Smart Array P420i boots to the login > prompt and hangs with Linux 4.13 or later. I cannot log in on console > or SSH into the machine. Linux 4.12 and older boot fine. > > ... ... This issue bit me for for two straight days. I was testing Mike Snitzers combined tree and this commit crept into the latest combined tree. commit 84676c1f21e8ff54befe985f4f14dc1edc10046b Author: Christoph Hellwig <hch@lst.de> Date: Fri Jan 12 10:53:05 2018 +0800 genirq/affinity: assign vectors to all possible CPUs Currently we assign managed interrupt vectors to all present CPUs. This works fine for systems were we only online/offline CPUs. But in case of systems that support physical CPU hotplug (or the virtualized version of it) this means the additional CPUs covered for in the ACPI tables or on the command line are not catered for. To fix this we'd either need to introduce new hotplug CPU states just for this case, or we can start assining vectors to possible but not present CPUs. Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Stefan Haberland <sth@linux.vnet.ibm.com> Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present CPU") Cc: linux-kernel@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Reason I never thought about this being my reason for the latest hang is I have used Linus' tree all the way to 4.15-rc7 with no issues. Vinson reporting it against 4.13 or later was not making sense because I had not seen the hang until this weekend. I checked and its in Linus's tree but its not an issue in the generic 4.15-rc7 for me. Anyway, its going to possibly bite anybody running HP DL servers with HPSA boot devices. I have not tried the workaround below. >From Vinsons message repeated here "The machine still hangs with Linux 4.15-rc6. I did a bisect. The hang is introduced with Linux 4.13-rc1 commit c5cb83bb337c25caae995d992d1cdf9b317f83de "genirq/cpuhotplug: Handle managed IRQs on CPU hotplug". There is a startup script that disables hyperthreading by offlining sibling CPUs. for CPU in $(cut -s -d, -f2 $SYS_PATH/cpu*/topology/thread_siblings_list | sort -un); do echo 0 > /sys/devices/system/cpu/cpu$CPU/online done If the above script is not run, the machine does not hang with Linux 4.13. Cheers, Vinson" Thanks Laurence ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: HP ProLiant DL360p Gen8 hangs with Linux 4.13+. 2018-01-14 23:40 ` Laurence Oberman @ 2018-01-15 12:17 ` Ming Lei 2018-01-15 12:51 ` Laurence Oberman 0 siblings, 1 reply; 10+ messages in thread From: Ming Lei @ 2018-01-15 12:17 UTC (permalink / raw) To: Laurence Oberman Cc: Vinson Lee, linux-scsi, Don Brace, Hellwig, Christoph, Jens Axboe, Thomas Gleixner, linux-kernel On Sun, Jan 14, 2018 at 06:40:40PM -0500, Laurence Oberman wrote: > On Thu, 2018-01-04 at 14:32 -0800, Vinson Lee wrote: > > Hi. > > > > HP ProLiant DL360p Gen8 with Smart Array P420i boots to the login > > prompt and hangs with Linux 4.13 or later. I cannot log in on console > > or SSH into the machine. Linux 4.12 and older boot fine. > > > > > ... > > ... > > This issue bit me for for two straight days. > I was testing Mike Snitzers combined tree and this commit crept into > the latest combined tree. > > commit 84676c1f21e8ff54befe985f4f14dc1edc10046b > Author: Christoph Hellwig <hch@lst.de> > Date: Fri Jan 12 10:53:05 2018 +0800 > > genirq/affinity: assign vectors to all possible CPUs > > Currently we assign managed interrupt vectors to all present > CPUs. This > works fine for systems were we only online/offline CPUs. But in > case of > systems that support physical CPU hotplug (or the virtualized > version of > it) this means the additional CPUs covered for in the ACPI tables > or on > the command line are not catered for. To fix this we'd either need > to > introduce new hotplug CPU states just for this case, or we can > start > assining vectors to possible but not present CPUs. > > Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> > Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> > Tested-by: Stefan Haberland <sth@linux.vnet.ibm.com> > Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present CPU") > Cc: linux-kernel@vger.kernel.org > Cc: Thomas Gleixner <tglx@linutronix.de> > Signed-off-by: Christoph Hellwig <hch@lst.de> > Signed-off-by: Jens Axboe <axboe@kernel.dk> > > Reason I never thought about this being my reason for the latest hang > is I have used Linus' tree all the way to 4.15-rc7 with no issues. > > Vinson reporting it against 4.13 or later was not making sense because > I had not seen the hang until this weekend. > > I checked and its in Linus's tree but its not an issue in the generic > 4.15-rc7 for me. Hi Laurence, Wrt. your issue, I have investigated a bit and found that it is because one irq vector may be assigned to all offline CPUs, and it may not be same with Vinson's. And the following patch can address your issue, I may prepare a formal version if no one objects this approach. Thomas, Christoph, could you take a look this patch? --- kernel/irq/affinity.c | 69 +++++++++++++++++++++++++++++++++++---------------- 1 file changed, 47 insertions(+), 22 deletions(-) diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c index a37a3b4b6342..dfc1f6a9c488 100644 --- a/kernel/irq/affinity.c +++ b/kernel/irq/affinity.c @@ -94,6 +94,39 @@ static int get_nodes_in_cpumask(cpumask_var_t *node_to_possible_cpumask, return nodes; } +/* + * Spread the affinity of @nmsk into @nr_vecs irq vectors, and the + * result is stored to @start_irqmsk. + */ +static int irq_vecs_spread_affinity(struct cpumask *irqmsk, + int max_irqmsks, + struct cpumask *nmsk, + int max_ncpus) +{ + int v, ncpus; + int vecs_to_assign, extra_vecs; + + /* Calculate the number of cpus per vector */ + ncpus = cpumask_weight(nmsk); + vecs_to_assign = min(max_ncpus, ncpus); + + /* Account for rounding errors */ + extra_vecs = ncpus - vecs_to_assign * (ncpus / vecs_to_assign); + + for (v = 0; v < min(max_irqmsks, vecs_to_assign); v++) { + int cpus_per_vec = ncpus / vecs_to_assign; + + /* Account for extra vectors to compensate rounding errors */ + if (extra_vecs) { + cpus_per_vec++; + --extra_vecs; + } + irq_spread_init_one(irqmsk + v, nmsk, cpus_per_vec); + } + + return v; +} + /** * irq_create_affinity_masks - Create affinity masks for multiqueue spreading * @nvecs: The total number of vectors @@ -104,7 +137,7 @@ static int get_nodes_in_cpumask(cpumask_var_t *node_to_possible_cpumask, struct cpumask * irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd) { - int n, nodes, cpus_per_vec, extra_vecs, curvec; + int n, nodes, curvec; int affv = nvecs - affd->pre_vectors - affd->post_vectors; int last_affv = affv + affd->pre_vectors; nodemask_t nodemsk = NODE_MASK_NONE; @@ -154,33 +187,25 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd) } for_each_node_mask(n, nodemsk) { - int ncpus, v, vecs_to_assign, vecs_per_node; + int vecs_per_node; /* Spread the vectors per node */ vecs_per_node = (affv - (curvec - affd->pre_vectors)) / nodes; - /* Get the cpus on this node which are in the mask */ - cpumask_and(nmsk, cpu_possible_mask, node_to_possible_cpumask[n]); - /* Calculate the number of cpus per vector */ - ncpus = cpumask_weight(nmsk); - vecs_to_assign = min(vecs_per_node, ncpus); - - /* Account for rounding errors */ - extra_vecs = ncpus - vecs_to_assign * (ncpus / vecs_to_assign); - - for (v = 0; curvec < last_affv && v < vecs_to_assign; - curvec++, v++) { - cpus_per_vec = ncpus / vecs_to_assign; - - /* Account for extra vectors to compensate rounding errors */ - if (extra_vecs) { - cpus_per_vec++; - --extra_vecs; - } - irq_spread_init_one(masks + curvec, nmsk, cpus_per_vec); - } + /* spread non-online possible cpus */ + cpumask_andnot(nmsk, node_to_possible_cpumask[n], cpu_online_mask); + irq_vecs_spread_affinity(&masks[curvec], last_affv - curvec, + nmsk, vecs_per_node); + /* + * spread online possible cpus to make sure each vector + * can get one online cpu to handle + */ + cpumask_and(nmsk, node_to_possible_cpumask[n], cpu_online_mask); + curvec += irq_vecs_spread_affinity(&masks[curvec], + last_affv - curvec, + nmsk, vecs_per_node); if (curvec >= last_affv) break; --nodes; -- 2.9.5 -- Ming ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: HP ProLiant DL360p Gen8 hangs with Linux 4.13+. 2018-01-15 12:17 ` Ming Lei @ 2018-01-15 12:51 ` Laurence Oberman 2018-01-15 15:01 ` Hellwig, Christoph 0 siblings, 1 reply; 10+ messages in thread From: Laurence Oberman @ 2018-01-15 12:51 UTC (permalink / raw) To: Ming Lei Cc: Vinson Lee, linux-scsi, Don Brace, Hellwig, Christoph, Jens Axboe, Thomas Gleixner, linux-kernel On Mon, 2018-01-15 at 20:17 +0800, Ming Lei wrote: > On Sun, Jan 14, 2018 at 06:40:40PM -0500, Laurence Oberman wrote: > > On Thu, 2018-01-04 at 14:32 -0800, Vinson Lee wrote: > > > Hi. > > > > > > HP ProLiant DL360p Gen8 with Smart Array P420i boots to the login > > > prompt and hangs with Linux 4.13 or later. I cannot log in on > > > console > > > or SSH into the machine. Linux 4.12 and older boot fine. > > > > > > > > > > ... > > > > ... > > > > This issue bit me for for two straight days. > > I was testing Mike Snitzers combined tree and this commit crept > > into > > the latest combined tree. > > > > commit 84676c1f21e8ff54befe985f4f14dc1edc10046b > > Author: Christoph Hellwig <hch@lst.de> > > Date: Fri Jan 12 10:53:05 2018 +0800 > > > > genirq/affinity: assign vectors to all possible CPUs > > > > Currently we assign managed interrupt vectors to all present > > CPUs. This > > works fine for systems were we only online/offline CPUs. But > > in > > case of > > systems that support physical CPU hotplug (or the virtualized > > version of > > it) this means the additional CPUs covered for in the ACPI > > tables > > or on > > the command line are not catered for. To fix this we'd either > > need > > to > > introduce new hotplug CPU states just for this case, or we can > > start > > assining vectors to possible but not present CPUs. > > > > Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> > > Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> > > Tested-by: Stefan Haberland <sth@linux.vnet.ibm.com> > > Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present > > CPU") > > Cc: linux-kernel@vger.kernel.org > > Cc: Thomas Gleixner <tglx@linutronix.de> > > Signed-off-by: Christoph Hellwig <hch@lst.de> > > Signed-off-by: Jens Axboe <axboe@kernel.dk> > > > > Reason I never thought about this being my reason for the latest > > hang > > is I have used Linus' tree all the way to 4.15-rc7 with no issues. > > > > Vinson reporting it against 4.13 or later was not making sense > > because > > I had not seen the hang until this weekend. > > > > I checked and its in Linus's tree but its not an issue in the > > generic > > 4.15-rc7 for me. > > Hi Laurence, > > Wrt. your issue, I have investigated a bit and found that it is > because > one irq vector may be assigned to all offline CPUs, and it may not be > same with Vinson's. > > And the following patch can address your issue, I may prepare a > formal > version if no one objects this approach. > > Thomas, Christoph, could you take a look this patch? > > --- > kernel/irq/affinity.c | 69 +++++++++++++++++++++++++++++++++++---- > ------------ > 1 file changed, 47 insertions(+), 22 deletions(-) > > diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c > index a37a3b4b6342..dfc1f6a9c488 100644 > --- a/kernel/irq/affinity.c > +++ b/kernel/irq/affinity.c > @@ -94,6 +94,39 @@ static int get_nodes_in_cpumask(cpumask_var_t > *node_to_possible_cpumask, > return nodes; > } > > +/* > + * Spread the affinity of @nmsk into @nr_vecs irq vectors, and the > + * result is stored to @start_irqmsk. > + */ > +static int irq_vecs_spread_affinity(struct cpumask *irqmsk, > + int max_irqmsks, > + struct cpumask *nmsk, > + int max_ncpus) > +{ > + int v, ncpus; > + int vecs_to_assign, extra_vecs; > + > + /* Calculate the number of cpus per vector */ > + ncpus = cpumask_weight(nmsk); > + vecs_to_assign = min(max_ncpus, ncpus); > + > + /* Account for rounding errors */ > + extra_vecs = ncpus - vecs_to_assign * (ncpus / > vecs_to_assign); > + > + for (v = 0; v < min(max_irqmsks, vecs_to_assign); v++) { > + int cpus_per_vec = ncpus / vecs_to_assign; > + > + /* Account for extra vectors to compensate rounding > errors */ > + if (extra_vecs) { > + cpus_per_vec++; > + --extra_vecs; > + } > + irq_spread_init_one(irqmsk + v, nmsk, cpus_per_vec); > + } > + > + return v; > +} > + > /** > * irq_create_affinity_masks - Create affinity masks for multiqueue > spreading > * @nvecs: The total number of vectors > @@ -104,7 +137,7 @@ static int get_nodes_in_cpumask(cpumask_var_t > *node_to_possible_cpumask, > struct cpumask * > irq_create_affinity_masks(int nvecs, const struct irq_affinity > *affd) > { > - int n, nodes, cpus_per_vec, extra_vecs, curvec; > + int n, nodes, curvec; > int affv = nvecs - affd->pre_vectors - affd->post_vectors; > int last_affv = affv + affd->pre_vectors; > nodemask_t nodemsk = NODE_MASK_NONE; > @@ -154,33 +187,25 @@ irq_create_affinity_masks(int nvecs, const > struct irq_affinity *affd) > } > > for_each_node_mask(n, nodemsk) { > - int ncpus, v, vecs_to_assign, vecs_per_node; > + int vecs_per_node; > > /* Spread the vectors per node */ > vecs_per_node = (affv - (curvec - affd- > >pre_vectors)) / nodes; > > - /* Get the cpus on this node which are in the mask > */ > - cpumask_and(nmsk, cpu_possible_mask, > node_to_possible_cpumask[n]); > > - /* Calculate the number of cpus per vector */ > - ncpus = cpumask_weight(nmsk); > - vecs_to_assign = min(vecs_per_node, ncpus); > - > - /* Account for rounding errors */ > - extra_vecs = ncpus - vecs_to_assign * (ncpus / > vecs_to_assign); > - > - for (v = 0; curvec < last_affv && v < > vecs_to_assign; > - curvec++, v++) { > - cpus_per_vec = ncpus / vecs_to_assign; > - > - /* Account for extra vectors to compensate > rounding errors */ > - if (extra_vecs) { > - cpus_per_vec++; > - --extra_vecs; > - } > - irq_spread_init_one(masks + curvec, nmsk, > cpus_per_vec); > - } > + /* spread non-online possible cpus */ > + cpumask_andnot(nmsk, node_to_possible_cpumask[n], > cpu_online_mask); > + irq_vecs_spread_affinity(&masks[curvec], last_affv - > curvec, > + nmsk, vecs_per_node); > > + /* > + * spread online possible cpus to make sure each > vector > + * can get one online cpu to handle > + */ > + cpumask_and(nmsk, node_to_possible_cpumask[n], > cpu_online_mask); > + curvec += irq_vecs_spread_affinity(&masks[curvec], > + last_affv - > curvec, > + nmsk, > vecs_per_node); > if (curvec >= last_affv) > break; > --nodes; > -- > 2.9.5 > > Hello Ming I will test the patch. I did not spend a lot of time seeing if this weekends stalls were an exact match to Vinson, I just knew pulling that patch resolved it. Perhaps this explains why I was not seeing this on generic 4.15-rc7. Thanks Laurence ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: HP ProLiant DL360p Gen8 hangs with Linux 4.13+. 2018-01-15 12:51 ` Laurence Oberman @ 2018-01-15 15:01 ` Hellwig, Christoph 2018-01-15 16:25 ` Laurence Oberman 0 siblings, 1 reply; 10+ messages in thread From: Hellwig, Christoph @ 2018-01-15 15:01 UTC (permalink / raw) To: Laurence Oberman Cc: Ming Lei, Vinson Lee, linux-scsi, Don Brace, Hellwig, Christoph, Jens Axboe, Thomas Gleixner, linux-kernel Laurence, I'm a little confused. Is this the same issue we just fixed, or is this an issue showing up with the fix? E.g. what kernel versions or trees are affected? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: HP ProLiant DL360p Gen8 hangs with Linux 4.13+. 2018-01-15 15:01 ` Hellwig, Christoph @ 2018-01-15 16:25 ` Laurence Oberman 0 siblings, 0 replies; 10+ messages in thread From: Laurence Oberman @ 2018-01-15 16:25 UTC (permalink / raw) To: Hellwig, Christoph Cc: Ming Lei, Vinson Lee, linux-scsi, Don Brace, Jens Axboe, Thomas Gleixner, linux-kernel On Mon, 2018-01-15 at 07:01 -0800, Hellwig, Christoph wrote: > Laurence, I'm a little confused. Is this the same issue we just > fixed, > or is this an issue showing up with the fix? > > E.g. what kernel versions or trees are affected? Hello Christoph This showed up on a combined tree of Mikes and Jens (4.15.0- rc4.block.dm.4.16) I was testing this weekend but was not apparent on the generic upstream 4.15-rc7 from Linus. I have to admit that was puzzling me. When I removed your commit the issue went away. Ming has crafted a fix so that your original commit can remain in and I am testing that now against the same tree that was hanging before. Ming has a handle on the issue so I will report back after testing. Kernel is building now Thanks Laurence ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2018-01-17 0:17 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-01-04 22:32 HP ProLiant DL360p Gen8 hangs with Linux 4.13+ Vinson Lee 2018-01-05 16:32 ` Bart Van Assche 2018-01-06 20:45 ` Laurence Oberman 2018-01-11 0:52 ` Vinson Lee 2018-01-17 0:17 ` Vinson Lee 2018-01-14 23:40 ` Laurence Oberman 2018-01-15 12:17 ` Ming Lei 2018-01-15 12:51 ` Laurence Oberman 2018-01-15 15:01 ` Hellwig, Christoph 2018-01-15 16:25 ` Laurence Oberman
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.