All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
To: jgg-uk2M96/98Pc@public.gmane.org,
	dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	"Michael J. Ruhl"
	<michael.j.ruhl-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Patel Jay P <jay.p.patel-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Sebastian Sanchez
	<sebastian.sanchez-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Subject: [PATCH for-next 01/11] IB/hfi1: Destroy link_wq workqueue after free_irq()
Date: Mon, 18 Dec 2017 19:56:16 -0800	[thread overview]
Message-ID: <20171219035612.2126.10447.stgit@scvm10.sc.intel.com> (raw)
In-Reply-To: <20171219034753.2126.78386.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>

From: Patel Jay P <jay.p.patel-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

A sporadic crash occurs when handle_8051_interrupt handler is invoked
while doing rmmod. Actually, handler is invoked after all workqueue
related resources are freed which results into crash.

Call Trace:
 queue_work_on+0x27/0x40
 handle_8051_interrupt+0x417/0x710 [hfi1]
 ? handle_dcc_err+0x212/0x660 [hfi1]
 ? check_preempt_wakeup+0x119/0x250
 ? tracing_is_on+0x15/0x30
 ? tracing_record_taskinfo_skip+0x1e/0x40
 ? radix_tree_next_chunk+0x10b/0x2e0
 ? __slab_free+0x9b/0x2c0
 interrupt_clear_down+0x43/0x120 [hfi1]
 is_dc_int+0x2f/0xa0 [hfi1]
 general_interrupt+0x18c/0x1f0 [hfi1]
 __free_irq+0x1b3/0x2d0
 free_irq+0x35/0x70
 pci_free_irq+0x1c/0x30
 clean_up_interrupts+0x53/0xf0 [hfi1]
 hfi1_start_cleanup+0x122/0x190 [hfi1]
 postinit_cleanup+0x1d/0x280 [hfi1]
 remove_one+0x233/0x250 [hfi1]
 pci_device_remove+0x39/0xc0

When kernel is built with CONFIG_DEBUG_SHIRQ config flag, an extra call
to IRQ handler is made from _free_irq() function. The driver should be
prepared for this fake call.

Adding a mechanism which detects whether handler is invoked after
disabling interrupts. hfi_intr_mask field is added to hfi1_devdata
structure which is replica of interrupt mask register of hfi device.
The field is updated while writing a value to register.

Destroying link_wq workqueue after calling free_irq. This will make sure
that if interrupt handler is invoked before or while calling free_irq
then workqueue is destroyed after interrupt is handled.

Fixes: 05cb18fda926 ("IB/hfi1: Update HFI to use the latest PCI API")
Reviewed-by: Michael J. Ruhl <michael.j.ruhl-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Reviewed-by: Sebastian Sanchez <sebastian.sanchez-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Patel Jay P <jay.p.patel-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/hw/hfi1/chip.c |    8 +++++++-
 drivers/infiniband/hw/hfi1/hfi.h  |    4 ++++
 drivers/infiniband/hw/hfi1/init.c |   31 ++++++++++++++++++++++---------
 3 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c
index 4f057e8..87748a6 100644
--- a/drivers/infiniband/hw/hfi1/chip.c
+++ b/drivers/infiniband/hw/hfi1/chip.c
@@ -8224,6 +8224,8 @@ static irqreturn_t general_interrupt(int irq, void *data)
 		/* only clear if anything is set */
 		if (regs[i])
 			write_csr(dd, CCE_INT_CLEAR + (8 * i), regs[i]);
+
+		regs[i] &= dd->hfi_intr_mask[i];
 	}
 
 	/* phase 2: call the appropriate handler */
@@ -12942,12 +12944,15 @@ void set_intr_state(struct hfi1_devdata *dd, u32 enable)
 			u64 mask = get_int_mask(dd, i);
 
 			write_csr(dd, CCE_INT_MASK + (8 * i), mask);
+			dd->hfi_intr_mask[i] = mask;
 		}
 
 		init_qsfp_int(dd);
 	} else {
-		for (i = 0; i < CCE_NUM_INT_CSRS; i++)
+		for (i = 0; i < CCE_NUM_INT_CSRS; i++) {
 			write_csr(dd, CCE_INT_MASK + (8 * i), 0ull);
+			dd->hfi_intr_mask[i] =  0ull;
+		}
 	}
 }
 
@@ -14773,6 +14778,7 @@ void hfi1_start_cleanup(struct hfi1_devdata *dd)
 	free_cntrs(dd);
 	free_rcverr(dd);
 	clean_up_interrupts(dd);
+	clean_up_workqueues(dd);
 	finish_chip_resources(dd);
 }
 
diff --git a/drivers/infiniband/hw/hfi1/hfi.h b/drivers/infiniband/hw/hfi1/hfi.h
index 4a9b4d7..e12a80b 100644
--- a/drivers/infiniband/hw/hfi1/hfi.h
+++ b/drivers/infiniband/hw/hfi1/hfi.h
@@ -1188,6 +1188,9 @@ struct hfi1_devdata {
 	/* INTx information */
 	u32 requested_intx_irq;		/* did we request one? */
 
+	/* copy of interrupt mask register */
+	u64 hfi_intr_mask[CCE_NUM_INT_CSRS];
+
 	/* general interrupt: mask of handled interrupts */
 	u64 gi_mask[CCE_NUM_INT_CSRS];
 
@@ -1993,6 +1996,7 @@ static inline void flush_wc(void)
 int kdeth_process_eager(struct hfi1_packet *packet);
 int process_receive_invalid(struct hfi1_packet *packet);
 void seqfile_dump_rcd(struct seq_file *s, struct hfi1_ctxtdata *rcd);
+void clean_up_workqueues(struct hfi1_devdata *dd);
 
 /* global module parameter variables */
 extern unsigned int hfi1_max_mtu;
diff --git a/drivers/infiniband/hw/hfi1/init.c b/drivers/infiniband/hw/hfi1/init.c
index 8e3b3e7..c84af52 100644
--- a/drivers/infiniband/hw/hfi1/init.c
+++ b/drivers/infiniband/hw/hfi1/init.c
@@ -823,6 +823,28 @@ static int create_workqueues(struct hfi1_devdata *dd)
 }
 
 /**
+ * clean_up_workqueues - destroys hfi1_wq and link_wq workqueues
+ * @dd: the hfi1_ib device
+ */
+void clean_up_workqueues(struct hfi1_devdata *dd)
+{
+	int pidx;
+	struct hfi1_pportdata *ppd;
+
+	for (pidx = 0; pidx < dd->num_pports; ++pidx) {
+		ppd = dd->pport + pidx;
+		if (ppd->hfi1_wq) {
+			destroy_workqueue(ppd->hfi1_wq);
+			ppd->hfi1_wq = NULL;
+		}
+		if (ppd->link_wq) {
+			destroy_workqueue(ppd->link_wq);
+			ppd->link_wq = NULL;
+		}
+	}
+}
+
+/**
  * hfi1_init - do the actual initialization sequence on the chip
  * @dd: the hfi1_ib device
  * @reinit: re-initializing, so don't allocate new memory
@@ -1102,15 +1124,6 @@ static void shutdown_device(struct hfi1_devdata *dd)
 		 * We can't count on interrupts since we are stopping.
 		 */
 		hfi1_quiet_serdes(ppd);
-
-		if (ppd->hfi1_wq) {
-			destroy_workqueue(ppd->hfi1_wq);
-			ppd->hfi1_wq = NULL;
-		}
-		if (ppd->link_wq) {
-			destroy_workqueue(ppd->link_wq);
-			ppd->link_wq = NULL;
-		}
 	}
 	sdma_exit(dd);
 }

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2017-12-19  3:56 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-19  3:56 [PATCH for-next 00/11] IB/hfi1, rdmavt, qib: Driver updates for 12/18/2017 Dennis Dalessandro
     [not found] ` <20171219034753.2126.78386.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
2017-12-19  3:56   ` Dennis Dalessandro [this message]
     [not found]     ` <20171219035612.2126.10447.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
2017-12-19 20:57       ` [PATCH for-next 01/11] IB/hfi1: Destroy link_wq workqueue after free_irq() Jason Gunthorpe
     [not found]         ` <20171219205754.GE14814-uk2M96/98Pc@public.gmane.org>
2017-12-20 21:01           ` Ruhl, Michael J
     [not found]             ` <14063C7AD467DE4B82DEDB5C278E86639F0E3917-AtyAts71sc88Ug9VwtkbtrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2017-12-20 21:11               ` Jason Gunthorpe
     [not found]                 ` <20171220211112.GG22908-uk2M96/98Pc@public.gmane.org>
2017-12-22 13:13                   ` Ruhl, Michael J
2017-12-19  3:56   ` [PATCH for-next 02/11] IB/hfi1: Check return value of strchr before using it Dennis Dalessandro
     [not found]     ` <20171219035621.2126.23093.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
2017-12-20  8:25       ` Leon Romanovsky
     [not found]         ` <20171220082555.GN2942-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2018-01-03 15:05           ` Dennis Dalessandro
     [not found]             ` <f5849e2b-c8cd-b93b-f32f-f423bff9ae31-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2018-01-03 15:27               ` Leon Romanovsky
     [not found]                 ` <20180103152721.GT10145-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2018-01-03 15:42                   ` Dennis Dalessandro
     [not found]                     ` <4555c08f-a568-48ea-e183-2d49ebd36c7c-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2018-01-05 17:39                       ` Doug Ledford
2017-12-19  3:56   ` [PATCH for-next 03/11] IB/rdmavt: No need to cancel RNRNAK retry timer when it is running Dennis Dalessandro
2017-12-19  3:56   ` [PATCH for-next 04/11] IB/{rdmavt, hfi1, qib}: Self determine driver name Dennis Dalessandro
     [not found]     ` <20171219035635.2126.59763.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
2017-12-19 20:59       ` Jason Gunthorpe
2017-12-19  3:56   ` [PATCH for-next 05/11] IB/{rdmavt, hfi1, qib}: Remove get_card_name() downcall Dennis Dalessandro
2017-12-19  3:56   ` [PATCH for-next 06/11] IB/rdmavt: Use correct numa node for SRQ allocation Dennis Dalessandro
     [not found]     ` <20171219035649.2126.1625.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
2017-12-20  8:17       ` Leon Romanovsky
     [not found]         ` <20171220081720.GM2942-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-12-20  8:31           ` Leon Romanovsky
2017-12-19  3:56   ` [PATCH for-next 07/11] IB/hfi1: Fix infinite loop in 8051 command error path Dennis Dalessandro
     [not found]     ` <20171219035657.2126.88651.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
2017-12-20  8:08       ` Leon Romanovsky
     [not found]         ` <20171220080854.GL2942-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-12-20 18:02           ` Sanchez, Sebastian
     [not found]             ` <5CDA63463B33C94CA80846587415F0772829387D-8oqHQFITsIGkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2017-12-20 18:12               ` Jason Gunthorpe
     [not found]                 ` <20171220181244.GD22908-uk2M96/98Pc@public.gmane.org>
2017-12-20 22:24                   ` Sanchez, Sebastian
2017-12-19  3:57   ` [PATCH for-next 08/11] IB/rdmavt: Allocate CQ memory on the correct node Dennis Dalessandro
2017-12-19  3:57   ` [PATCH for-next 09/11] rdma: Update maintainer contact for Intel RDMA drivers Dennis Dalessandro
     [not found]     ` <20171219035711.2126.47130.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
2017-12-19 20:51       ` Jason Gunthorpe
2017-12-22 23:39       ` Jason Gunthorpe
2017-12-19  3:57   ` [PATCH for-next 10/11] IB/{hfi1, qib}: Fix a concurrency issue with device name in logging Dennis Dalessandro
2017-12-19  3:57   ` [PATCH for-next 11/11] IB/rdmavt: Add trace for RNRNAK timer Dennis Dalessandro
2018-01-05 18:36   ` [PATCH for-next 00/11] IB/hfi1, rdmavt, qib: Driver updates for 12/18/2017 Doug Ledford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171219035612.2126.10447.stgit@scvm10.sc.intel.com \
    --to=dennis.dalessandro-ral2jqcrhueavxtiumwx3w@public.gmane.org \
    --cc=dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=jay.p.patel-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=jgg-uk2M96/98Pc@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=michael.j.ruhl-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=sebastian.sanchez-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.