All of lore.kernel.org
 help / color / mirror / Atom feed
From: Laurence Oberman <loberman@redhat.com>
To: "Madhani, Himanshu" <Himanshu.Madhani@cavium.com>,
	Li Wang <liwang@redhat.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
	"Tran, Quinn" <Quinn.Tran@cavium.com>,
	"William.Kuzeja@stratus.com" <William.Kuzeja@stratus.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>
Subject: Re: qla2xxx cause BUG on kernel-4.17-rc6
Date: Wed, 06 Jun 2018 15:27:15 -0400	[thread overview]
Message-ID: <1528313235.17774.5.camel@redhat.com> (raw)
In-Reply-To: <B3C74965-4B74-404A-9BA3-395F49F64179@cavium.com>

On Wed, 2018-06-06 at 18:31 +0000, Madhani, Himanshu wrote:
> Hi Li, 
> 
> > On Jun 6, 2018, at 11:05 AM, Laurence Oberman <loberman@redhat.com>
> > wrote:
> > 
> > On Wed, 2018-06-06 at 16:01 +0000, Madhani, Himanshu wrote:
> > > > On Jun 6, 2018, at 8:56 AM, Martin K. Petersen <martin.petersen
> > > > @ora
> > > > cle.com> wrote:
> > > > 
> > > > 
> > > > Himanshu,
> > > > 
> > > > Ping?
> > > > 
> > > 
> > > Will look at this one. Sorry, somehow fell thru cracks. 
> > > 
> > > 
> > > > > Hi scsi experts,
> > > > > 
> > > > > Not sure who is the right person to ask, I just hit this bug
> > > > > on
> > > > > my HP
> > > > > DL385 platform, can any one of you take a look?
> > > > > 
> > > > > system config:
> > > > > -----------------
> > > > > HP ProLiant DL385 G7
> > > > > AMD Opteron(TM) Processor 6234
> > > > > 16384 MB memory, 369 GB disk space
> > > > > 
> > > > > 
> > > > > [   24.539274] qla2xxx [0000:0c:00.7]-500a:5: LOOP UP
> > > > > detected
> > > > > (10 Gbps).
> > > > > [   24.577259] BUG: unable to handle kernel NULL pointer
> > > > > dereference
> > > > > at 0000000000000102
> > > > > [   24.623133] PGD 0 P4D 0
> > > > > [   24.636760] Oops: 0000 [#1] SMP NOPTI
> > > > > [   24.656942] Modules linked in: i2c_algo_bit drm_kms_helper
> > > > > sr_mod(+) syscopyarea sysfillrect sysimgblt cdrom fb_sys_fops
> > > > > ata_generic ttm pata_acpi sd_mod ahci pata_atiixp sfc(+)
> > > > > qla2xxx(+)
> > > > > libahci drm qla4xxx(+) nvme_fc hpsa mdio libiscsi qlcnic(+)
> > > > > nvme_fabrics scsi_transport_sas serio_raw mtd crc32c_intel
> > > > > libata
> > > > > nvme_core i2c_core scsi_transport_iscsi tg3 scsi_transport_fc
> > > > > bnx2
> > > > > iscsi_boot_sysfs dm_multipath dm_mirror dm_region_hash dm_log
> > > > > dm_mod
> > > > > [   24.887449] CPU: 0 PID: 177 Comm: kworker/0:3 Not tainted
> > > > > 4.17.0-rc6 #1
> > > > > [   24.925119] Hardware name: HP ProLiant DL385 G7, BIOS A18
> > > > > 08/15/2012
> > > > > [   24.962106] Workqueue: events work_for_cpu_fn
> > > > > [   24.987098] RIP: 0010:__queue_work+0x1f/0x3a0
> > > > > [   25.011672] RSP: 0018:ffff992642ceba10 EFLAGS: 00010082
> > > > > [   25.042116] RAX: 0000000000000082 RBX: 0000000000000082
> > > > > RCX:
> > > > > 0000000000000000
> > > > > [   25.083293] RDX: ffff8cf9abc6d7d0 RSI: 0000000000000000
> > > > > RDI:
> > > > > 0000000000002000
> > > > > [   25.123094] RBP: 0000000000000000 R08: 0000000000025a40
> > > > > R09:
> > > > > ffff8cf9aade2880
> > > > > [   25.164087] R10: 0000000000000000 R11: ffff992642ceb6f0
> > > > > R12:
> > > > > ffff8cf9abc6d7d0
> > > > > [   25.202280] R13: 0000000000002000 R14: ffff8cf9abc6d7b8
> > > > > R15:
> > > > > 0000000000002000
> > > > > [   25.242050] FS:  0000000000000000(0000) f9b5c00000(0000)
> > > > > knlGS:0000000000000000
> > > > > [   25.977565] CS:  0010 DS: 0000 ES: 0000 CR0:
> > > > > 0000000080050033
> > > > > [   26.010457] CR2: 0000000000000102 CR3: 000000030760a000
> > > > > CR4:
> > > > > 00000000000406f0
> > > > > [   26.051048] Call Trace:
> > > > > [   26.063572]  ? __switch_to_asm+0x34/0x70
> > > > > [   26.086079]  queue_work_on+0x24/0x40
> > > > > [   26.107090]  qla2x00_post_work+0x81/0xb0 [qla2xxx]
> > > > > [   26.133356]  qla2x00_async_event+0x1ad/0x1a20 [qla2xxx]
> > > > > [   26.164075]  ? lock_timer_base+0x67/0x80
> > > > > [   26.186420]  ? try_to_del_timer_sync+0x4d/0x80
> > > > > [   26.212284]  ? del_timer_sync+0x35/0x40
> > > > > [   26.234080]  ? schedule_timeout+0x165/0x2f0
> > > > > [   26.259575]  qla82xx_poll+0x13e/0x180 [qla2xxx]
> > > > > [   26.285740]  qla2x00_mailbox_command+0x74b/0xf50 [qla2xxx]
> > > > > [   26.319040]  qla82xx_set_driver_version+0x13b/0x1c0
> > > > > [qla2xxx]
> > > > > [   26.352108]  ? qla2x00_init_rings+0x206/0x3f0 [qla2xxx]
> > > > > [   26.381733]  qla2x00_initialize_adapter+0x35c/0x7f0
> > > > > [qla2xxx]
> > > > > [   26.413240]  qla2x00_probe_one+0x1479/0x2390 [qla2xxx]
> > > > > [   26.442055]  local_pci_probe+0x3f/0xa0
> > > > > [   26.463108]  work_for_cpu_fn+0x10/0x20
> > > > > [   26.483295]  process_one_work+0x152/0x350
> > > > > [   26.505730]  worker_thread+0x1cf/0x3e0
> > > > > [   26.527090]  kthread+0xf5/0x130
> > > > > [   26.545085]  ? max_active_store+0x80/0x80
> > > > > [   26.568085]  ? kthread_bind+0x10/0x10
> > > > > [   26.589533]  ret_from_fork+0x22/0x40
> > > > > [   26.610192] Code: 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f
> > > > > 1f 44
> > > > > 00
> > > > > 00 41 57 41 89 ff 41 56 41 55 41 89 fd 41 54 49 89 d4 55 48
> > > > > 89 f5
> > > > > 53
> > > > > 48 83 ec 0 86 02 01 00 00 01 0f 85 80 02 00 00 49 c7 c6 c0 ec
> > > > > 01
> > > > > 00 41
> > > > > [   27.308540] RIP: __queue_work+0x1f/0x3a0 RSP:
> > > > > ffff992642ceba10
> > > > > [   27.341591] CR2: 0000000000000102
> > > > > [   27.360208] ---[ end trace 01b7b7ae2c005cf3 ]---
> > > > 
> > > > -- 
> > > > Martin K. Petersen	Oracle Linux Engineering
> > > 
> > > Thanks,
> > > - Himanshu
> > > 
> > 
> > I can't find the original message for this that Martin reminded us
> > of.
> > 
> > To the person who logged this:
> > How many times has this happened and was it after a kernel update.
> > What is the history, what is the exact Qlogic card, etc.
> > Do you have the rest of the log log leading to the invalid pointer
> > fault
> > 
> > Thanks
> > Laurence
> 
> From the Snippet of Log provided looks like the crash is with 10G
> FCoE adapter. 
> 
> Can you try this untested diff to see if it resolves issue. 
> 
> Basically we are initializing adapter so driver will start receiving
> AEN notification
> but we have not yet allocated work queue for it. 
> 
> 
> ————— <snip> ————
> 
> diff --git a/drivers/scsi/qla2xxx/qla_os.c
> b/drivers/scsi/qla2xxx/qla_os.c
> index 30bf4b9..462d825 100644
> --- a/drivers/scsi/qla2xxx/qla_os.c
> +++ b/drivers/scsi/qla2xxx/qla_os.c
> @@ -3229,6 +3229,8 @@ qla2x00_probe_one(struct pci_dev *pdev, const
> struct pci_device_id *id)
>             "req->req_q_in=%p req->req_q_out=%p rsp->rsp_q_in=%p rsp-
> >rsp_q_out=%p.\n",
>             req->req_q_in, req->req_q_out, rsp->rsp_q_in, rsp-
> >rsp_q_out);
> +       ha->wq = alloc_workqueue("qla2xxx_wq", 0, 0);
> +
>         if (ha->isp_ops->initialize_adapter(base_vha)) {
>                 ql_log(ql_log_fatal, base_vha, 0x00d6,
>                     "Failed to initialize adapter - Adapter flags
> %x.\n",
> @@ -3270,7 +3272,7 @@ qla2x00_probe_one(struct pci_dev *pdev, const
> struct pci_device_id *id)
>             host->can_queue, base_vha->req,
>             base_vha->mgmt_svr_loop_id, host->sg_tablesize);
>         INIT_WORK(&base_vha->iocb_work, qla2x00_iocb_work_fn);
> -       ha->wq = alloc_workqueue("qla2xxx_wq", 0, 0);
> +
>         if (ha->mqenable) {
>                 bool mq = false;
> 
> ————— </snip> ————
> 
> Thanks,
> - Himanshu
> 

Makes sense, but how did they escape this happening before ?
I cannot find the one that we looked at together about this but mine
was not @10G 

  reply	other threads:[~2018-06-06 19:27 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAEemH2dj3SVsNZOsZSjGZ0nF=a3YZp=i9z11vYTHByRQzpFkfQ@mail.gmail.com>
2018-06-06 15:56 ` qla2xxx cause BUG on kernel-4.17-rc6 Martin K. Petersen
2018-06-06 16:01   ` Madhani, Himanshu
2018-06-06 18:05     ` Laurence Oberman
2018-06-06 18:31       ` Madhani, Himanshu
2018-06-06 19:27         ` Laurence Oberman [this message]
2018-06-06 20:07           ` Laurence Oberman
2018-06-06 16:14   ` Laurence Oberman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1528313235.17774.5.camel@redhat.com \
    --to=loberman@redhat.com \
    --cc=Himanshu.Madhani@cavium.com \
    --cc=Quinn.Tran@cavium.com \
    --cc=William.Kuzeja@stratus.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=liwang@redhat.com \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.