linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* kernel problems with smart on LSI-92xx
@ 2010-11-09 19:27 Bokhan Artem
  2010-11-11 14:06 ` Bjørn Mork
  0 siblings, 1 reply; 7+ messages in thread
From: Bokhan Artem @ 2010-11-09 19:27 UTC (permalink / raw)
  To: linux-ide

[-- Attachment #1: Type: text/plain, Size: 1016 bytes --]

Hello.

I have kernel problems when trying to run smart commands on LSI-92xx 
controller.

Running self-test  on SAS disk (smartctl /dev/sda -dmegaraid,24 -t long) 
with smartmontools causes kernel oops (?) (and segfault). Look in 
attachment for dmesg.

strace of smartctl:

mknod("/dev/megaraid_sas_ioctl_node", S_IFCHR, makedev(251, 0)) = -1 EEXIST
(File exists)
close(4)                                = 0
munmap(0x7f4be3a9f000, 4096)            = 0
open("/dev/megaraid_sas_ioctl_node", O_RDWR) = 4
ioctl(4, MTRRIOC_SET_ENTRY, 0x7fffa574ed30) = 0
ioctl(4, MTRRIOC_SET_ENTRY, 0x7fffa574eb30) = 0
ioctl(4, MTRRIOC_SET_ENTRY <unfinished ...>
+++ killed by SIGSEGV +++


Viewing  smart info is OK (smartctl /dev/sda -dmegaraid,24 -a).
Running self-test on SATA disk on the same system is OK.

The problem is reproducible with 2.6.32 and 2.6.36 kernels.

I understand that linux-ide@vger.kernel.org is not right place to ask 
this question, but I believe that somebody here can point the way to 
resolve the problem.

[-- Attachment #2: dmesg.txt --]
[-- Type: text/plain, Size: 4876 bytes --]

[   69.162393] ------------[ cut here ]------------
[   69.162404] WARNING: at /build/buildd/linux-2.6.32/mm/page_alloc.c:1806 __alloc_pages_slowpath+0x43b/0x580()
[   69.162407] Hardware name: X8DTN
[   69.162409] Modules linked in: fbcon tileblit font bitblit softcursor vga16fb vgastate ioatdma radeon ttm drm_kms_helper shpchp drm i2c_algo_bit lp parport floppy pata_jmicron megaraid_sas igb dca
[   69.162429] Pid: 1206, comm: smartctl Not tainted 2.6.32-25-server #45-Ubuntu
[   69.162432] Call Trace:
[   69.162439]  [<ffffffff81065f3b>] warn_slowpath_common+0x7b/0xc0
[   69.162443]  [<ffffffff81065f94>] warn_slowpath_null+0x14/0x20
[   69.162447]  [<ffffffff810f98fb>] __alloc_pages_slowpath+0x43b/0x580
[   69.162454]  [<ffffffff8101078c>] ? __switch_to+0x1ac/0x320
[   69.162459]  [<ffffffff81057850>] ? finish_task_switch+0x50/0xe0
[   69.162463]  [<ffffffff810f9bb1>] __alloc_pages_nodemask+0x171/0x180
[   69.162468]  [<ffffffff81017536>] dma_generic_alloc_coherent+0xa6/0x160
[   69.162475]  [<ffffffff81038b01>] x86_swiotlb_alloc_coherent+0x31/0x70
[   69.162482]  [<ffffffffa002d0ce>] megasas_mgmt_fw_ioctl+0x1ae/0x690 [megaraid_sas]
[   69.162488]  [<ffffffffa002d748>] megasas_mgmt_ioctl_fw+0x198/0x240 [megaraid_sas]
[   69.162494]  [<ffffffffa002f695>] megasas_mgmt_ioctl+0x35/0x50 [megaraid_sas]
[   69.162500]  [<ffffffff81153b12>] vfs_ioctl+0x22/0xa0
[   69.162505]  [<ffffffff8115da2a>] ? alloc_fd+0x10a/0x150
[   69.162509]  [<ffffffff81153cb1>] do_vfs_ioctl+0x81/0x410
[   69.162515]  [<ffffffff8155cc13>] ? do_page_fault+0x153/0x3b0
[   69.162518]  [<ffffffff811540c1>] sys_ioctl+0x81/0xa0
[   69.162523]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
[   69.162526] ---[ end trace 6a2181b634e2abc6 ]---
[   69.162538] ------------[ cut here ]------------
[   69.162806] kernel BUG at /build/buildd/linux-2.6.32/lib/swiotlb.c:368!
[   69.163134] invalid opcode: 0000 [#1] SMP
[   69.163570] last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
[   69.163975] CPU 0
[   69.164227] Modules linked in: fbcon tileblit font bitblit softcursor vga16fb vgastate ioatdma radeon ttm drm_kms_helper shpchp drm i2c_algo_bit lp parport floppy pata_jmicron megaraid_sas igb dca
[   69.167419] Pid: 1206, comm: smartctl Tainted: G        W  2.6.32-25-server #45-Ubuntu X8DTN
[   69.167843] RIP: 0010:[<ffffffff812c4dc5>]  [<ffffffff812c4dc5>] map_single+0x255/0x260
[   69.168370] RSP: 0018:ffff88081c0ebc58  EFLAGS: 00010246
[   69.168655] RAX: 000000000003bffc RBX: 00000000ffffffff RCX: 0000000000000002
[   69.169000] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88001dffe000
[   69.169346] RBP: ffff88081c0ebcb8 R08: 0000000000000000 R09: ffff880000030840
[   69.169691] R10: 0000000000100000 R11: 0000000000000000 R12: 0000000000000000
[   69.170036] R13: 00000000ffffffff R14: 0000000000000001 R15: 0000000000200000
[   69.170382] FS:  00007fb8de189720(0000) GS:ffff88001de00000(0000) knlGS:0000000000000000
[   69.170794] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   69.171094] CR2: 00007fb8dd59237c CR3: 000000081a790000 CR4: 00000000000006f0
[   69.171439] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   69.171784] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   69.172130] Process smartctl (pid: 1206, threadinfo ffff88081c0ea000, task ffff88081a760000)
[   69.194513] Stack:
[   69.205788]  0000000000000034 00000002817e3390 0000000000000000 ffff88081c0ebe00
[   69.217739] <0> 0000000000000000 000000000003bffc 0000000000000000 0000000000000000
[   69.241250] <0> 0000000000000000 00000000ffffffff ffff88081c5b4080 ffff88081c0ebe00
[   69.277310] Call Trace:
[   69.289278]  [<ffffffff812c52ac>] swiotlb_alloc_coherent+0xec/0x130
[   69.301118]  [<ffffffff81038b31>] x86_swiotlb_alloc_coherent+0x61/0x70
[   69.313045]  [<ffffffffa002d0ce>] megasas_mgmt_fw_ioctl+0x1ae/0x690 [megaraid_sas]
[   69.336399]  [<ffffffffa002d748>] megasas_mgmt_ioctl_fw+0x198/0x240 [megaraid_sas]
[   69.359346]  [<ffffffffa002f695>] megasas_mgmt_ioctl+0x35/0x50 [megaraid_sas]
[   69.370902]  [<ffffffff81153b12>] vfs_ioctl+0x22/0xa0
[   69.382322]  [<ffffffff8115da2a>] ? alloc_fd+0x10a/0x150
[   69.393622]  [<ffffffff81153cb1>] do_vfs_ioctl+0x81/0x410
[   69.404696]  [<ffffffff8155cc13>] ? do_page_fault+0x153/0x3b0
[   69.415761]  [<ffffffff811540c1>] sys_ioctl+0x81/0xa0
[   69.426640]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
[   69.437491] Code: fe ff ff 48 8b 3d 74 38 76 00 41 bf 00 00 20 00 e8 51 f5 d7 ff 83 e0 ff 48 05 ff 07 00 00 48 c1 e8 0b 48 89 45 c8 e9 13 fe ff ff <0f> 0b eb fe 0f 1f 80 00 00 00 00 55 48 89 e5 48 83 ec 20 4c 89
[   69.478216] RIP  [<ffffffff812c4dc5>] map_single+0x255/0x260
[   69.489668]  RSP <ffff88081c0ebc58>
[   69.500975] ---[ end trace 6a2181b634e2abc7 ]---

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kernel problems with smart on LSI-92xx
  2010-11-11 14:06 ` Bjørn Mork
@ 2010-11-10 17:21   ` Bokhan Artem
  2010-11-11 18:02     ` [PATCH] [SCSI] megaraid_sas: Sanity check user supplied length before passing it to dma_alloc_coherent() Bjørn Mork
  2010-11-26 15:42   ` kernel problems with smart on LSI-92xx Artem Bokhan
  1 sibling, 1 reply; 7+ messages in thread
From: Bokhan Artem @ 2010-11-10 17:21 UTC (permalink / raw)
  To: Bjørn Mork; +Cc: linux-ide

Great, that works. Thank you a lot!

./smartctl /dev/sda -dmegaraid,24 -t long
smartctl 5.41 2010-11-05 r3203 [x86_64-unknown-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

Extended Background Self Test has begun
Please wait 22 minutes for test to complete.
Estimated completion time: Thu Nov 11 17:44:50 2010

11.11.2010 20:06, Bjørn Mork пишет:
> Bokhan Artem<aptem@ngs.ru>  writes:
>
>> Hello.
>>
>> I have kernel problems when trying to run smart commands on LSI-92xx
>> controller.
>>
>> Running self-test  on SAS disk (smartctl /dev/sda -dmegaraid,24 -t
>> long) with smartmontools causes kernel oops (?) (and segfault). Look
>> in attachment for dmesg.
>>
>> strace of smartctl:
>>
>> mknod("/dev/megaraid_sas_ioctl_node", S_IFCHR, makedev(251, 0)) = -1 EEXIST
>> (File exists)
>> close(4)                                = 0
>> munmap(0x7f4be3a9f000, 4096)            = 0
>> open("/dev/megaraid_sas_ioctl_node", O_RDWR) = 4
>> ioctl(4, MTRRIOC_SET_ENTRY, 0x7fffa574ed30) = 0
>> ioctl(4, MTRRIOC_SET_ENTRY, 0x7fffa574eb30) = 0
>> ioctl(4, MTRRIOC_SET_ENTRY<unfinished ...>
>> +++ killed by SIGSEGV +++
>>
>>
>> Viewing  smart info is OK (smartctl /dev/sda -dmegaraid,24 -a).
>> Running self-test on SATA disk on the same system is OK.
>>
>> The problem is reproducible with 2.6.32 and 2.6.36 kernels.
> A quick look at this reveals that smartctl will happily do a
> MEGASAS_IOC_FIRMWARE ioctl with sge_count = 1 and sgl[0].iov_len = 0 if
> it is sending a command with dataLen == 0 :
>
>
> /* Issue passthrough scsi command to PERC5/6 controllers */
> bool linux_megaraid_device::megasas_cmd(int cdbLen, void *cdb,
>    int dataLen, void *data,
>    int /*senseLen*/, void * /*sense*/, int /*report*/)
> {
>    struct megasas_pthru_frame    *pthru;
>    struct megasas_iocpacket      uio;
>    struct megasas_iocpacket      uio;
>    int rc;
>
>    memset(&uio, 0, sizeof(uio));
>    pthru = (struct megasas_pthru_frame *)uio.frame.raw;
>    pthru->cmd = MFI_CMD_PD_SCSI_IO;
>    int rc;
>
>    memset(&uio, 0, sizeof(uio));
>    pthru = (struct megasas_pthru_frame *)uio.frame.raw;
>    pthru->cmd = MFI_CMD_PD_SCSI_IO;
>    pthru->cmd_status = 0xFF;
>    pthru->scsi_status = 0x0;
>    pthru->target_id = m_disknum;
>    pthru->lun = 0;
>    pthru->cdb_len = cdbLen;
>    pthru->timeout = 0;
>    pthru->flags = MFI_FRAME_DIR_READ;
>    pthru->sge_count = 1;
>    pthru->data_xfer_len = dataLen;
>    pthru->sgl.sge32[0].phys_addr = (intptr_t)data;
>    pthru->sgl.sge32[0].length = (uint32_t)dataLen;
>    memcpy(pthru->cdb, cdb, cdbLen);
>
>    uio.host_no = m_hba;
>    uio.sge_count = 1;
>    uio.sgl_off = offsetof(struct megasas_pthru_frame, sgl);
>    uio.sgl[0].iov_base = data;
>    uio.sgl[0].iov_len = dataLen;
>
>    rc = 0;
>    errno = 0;
>    rc = ioctl(m_fd, MEGASAS_IOC_FIRMWARE,&uio);
>    if (pthru->cmd_status || rc != 0) {
>      if (pthru->cmd_status == 12) {
>        return set_err(EIO, "megasas_cmd: Device %d does not exist\n", m_disknum);
>      }
>      return set_err((errno ? errno : EIO), "megasas_cmd result: %d.%d = %d/%d",
>                     m_hba, m_disknum, errno,
>                     pthru->cmd_status);
>    }
>    return true;
> }
>
>
>
> The kernel bug is that the zero valued sgl[0].iov_len is passed
> unmodified to megasas_mgmt_fw_ioctl() which again passes it on as size
> to dma_alloc_coherent():
>
>          /*
>           * For each user buffer, create a mirror buffer and copy in
>           */
>          for (i = 0; i<  ioc->sge_count; i++) {
>                  kbuff_arr[i] = dma_alloc_coherent(&instance->pdev->dev,
>                                                      ioc->sgl[i].iov_len,
>                                                      &buf_handle, GFP_KERNEL);
>
>
>
> And it looks like most (all?) of the dma_alloc_coherent()
> implementations will use get_order(size) to compute the necessary
> allocation. This will fail if size == 0.
>
> On the other hand, I may have misunderstood this entirely....
>
> But if you dare, you could try the attached patch (compile tested only
> as I don't have the hardware) and see if it helps.  Let me know how it
> goes, and I'll forward it to the megaraid manitainers if it really fixes
> your problem.
>
>
>
>
> Bjørn
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kernel problems with smart on LSI-92xx
  2010-11-09 19:27 kernel problems with smart on LSI-92xx Bokhan Artem
@ 2010-11-11 14:06 ` Bjørn Mork
  2010-11-10 17:21   ` Bokhan Artem
  2010-11-26 15:42   ` kernel problems with smart on LSI-92xx Artem Bokhan
  0 siblings, 2 replies; 7+ messages in thread
From: Bjørn Mork @ 2010-11-11 14:06 UTC (permalink / raw)
  To: linux-ide; +Cc: Bokhan Artem

[-- Attachment #1: Type: text/plain, Size: 3663 bytes --]

Bokhan Artem <aptem@ngs.ru> writes:

> Hello.
>
> I have kernel problems when trying to run smart commands on LSI-92xx
> controller.
>
> Running self-test  on SAS disk (smartctl /dev/sda -dmegaraid,24 -t
> long) with smartmontools causes kernel oops (?) (and segfault). Look
> in attachment for dmesg.
>
> strace of smartctl:
>
> mknod("/dev/megaraid_sas_ioctl_node", S_IFCHR, makedev(251, 0)) = -1 EEXIST
> (File exists)
> close(4)                                = 0
> munmap(0x7f4be3a9f000, 4096)            = 0
> open("/dev/megaraid_sas_ioctl_node", O_RDWR) = 4
> ioctl(4, MTRRIOC_SET_ENTRY, 0x7fffa574ed30) = 0
> ioctl(4, MTRRIOC_SET_ENTRY, 0x7fffa574eb30) = 0
> ioctl(4, MTRRIOC_SET_ENTRY <unfinished ...>
> +++ killed by SIGSEGV +++
>
>
> Viewing  smart info is OK (smartctl /dev/sda -dmegaraid,24 -a).
> Running self-test on SATA disk on the same system is OK.
>
> The problem is reproducible with 2.6.32 and 2.6.36 kernels.

A quick look at this reveals that smartctl will happily do a
MEGASAS_IOC_FIRMWARE ioctl with sge_count = 1 and sgl[0].iov_len = 0 if
it is sending a command with dataLen == 0 :


/* Issue passthrough scsi command to PERC5/6 controllers */
bool linux_megaraid_device::megasas_cmd(int cdbLen, void *cdb, 
  int dataLen, void *data,
  int /*senseLen*/, void * /*sense*/, int /*report*/)
{
  struct megasas_pthru_frame    *pthru;
  struct megasas_iocpacket      uio;
  struct megasas_iocpacket      uio;
  int rc;

  memset(&uio, 0, sizeof(uio));
  pthru = (struct megasas_pthru_frame *)uio.frame.raw;
  pthru->cmd = MFI_CMD_PD_SCSI_IO;
  int rc;

  memset(&uio, 0, sizeof(uio));
  pthru = (struct megasas_pthru_frame *)uio.frame.raw;
  pthru->cmd = MFI_CMD_PD_SCSI_IO;
  pthru->cmd_status = 0xFF;
  pthru->scsi_status = 0x0;
  pthru->target_id = m_disknum;
  pthru->lun = 0;
  pthru->cdb_len = cdbLen;
  pthru->timeout = 0;
  pthru->flags = MFI_FRAME_DIR_READ;
  pthru->sge_count = 1;
  pthru->data_xfer_len = dataLen;
  pthru->sgl.sge32[0].phys_addr = (intptr_t)data;
  pthru->sgl.sge32[0].length = (uint32_t)dataLen;
  memcpy(pthru->cdb, cdb, cdbLen);

  uio.host_no = m_hba;
  uio.sge_count = 1;
  uio.sgl_off = offsetof(struct megasas_pthru_frame, sgl);
  uio.sgl[0].iov_base = data;
  uio.sgl[0].iov_len = dataLen;

  rc = 0;
  errno = 0;
  rc = ioctl(m_fd, MEGASAS_IOC_FIRMWARE, &uio);
  if (pthru->cmd_status || rc != 0) {
    if (pthru->cmd_status == 12) {
      return set_err(EIO, "megasas_cmd: Device %d does not exist\n", m_disknum);
    }
    return set_err((errno ? errno : EIO), "megasas_cmd result: %d.%d = %d/%d",
                   m_hba, m_disknum, errno,
                   pthru->cmd_status);
  }
  return true;
}



The kernel bug is that the zero valued sgl[0].iov_len is passed
unmodified to megasas_mgmt_fw_ioctl() which again passes it on as size
to dma_alloc_coherent():

        /*
         * For each user buffer, create a mirror buffer and copy in
         */
        for (i = 0; i < ioc->sge_count; i++) {
                kbuff_arr[i] = dma_alloc_coherent(&instance->pdev->dev,
                                                    ioc->sgl[i].iov_len,
                                                    &buf_handle, GFP_KERNEL);



And it looks like most (all?) of the dma_alloc_coherent()
implementations will use get_order(size) to compute the necessary
allocation. This will fail if size == 0.

On the other hand, I may have misunderstood this entirely....

But if you dare, you could try the attached patch (compile tested only
as I don't have the hardware) and see if it helps.  Let me know how it
goes, and I'll forward it to the megaraid manitainers if it really fixes
your problem.




Bjørn


[-- Attachment #2: 0001-SCSI-megaraid_sas-Sanity-check-user-supplied-length-.patch --]
[-- Type: text/x-diff, Size: 4421 bytes --]

>From 110053460ce4b33097b7aefe4a7e60c5494b6014 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bjorn@mork.no>
Date: Thu, 11 Nov 2010 14:54:11 +0100
Subject: [PATCH] [SCSI] megaraid_sas: Sanity check user supplied length before passing it to dma_alloc_coherent()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The ioc->sgl[i].iov_len value is supplied by the ioctl caller, and can be
zero in some cases.  Assume that's valid and continue without error.

Fixes:

[   69.162538] ------------[ cut here ]------------
[   69.162806] kernel BUG at /build/buildd/linux-2.6.32/lib/swiotlb.c:368!
[   69.163134] invalid opcode: 0000 [#1] SMP
[   69.163570] last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
[   69.163975] CPU 0
[   69.164227] Modules linked in: fbcon tileblit font bitblit softcursor vga16fb vgastate ioatdma radeon ttm drm_kms_helper shpchp drm i2c_algo_bit lp parport floppy pata_jmicron megaraid_sas igb dca
[   69.167419] Pid: 1206, comm: smartctl Tainted: G        W  2.6.32-25-server #45-Ubuntu X8DTN
[   69.167843] RIP: 0010:[<ffffffff812c4dc5>]  [<ffffffff812c4dc5>] map_single+0x255/0x260
[   69.168370] RSP: 0018:ffff88081c0ebc58  EFLAGS: 00010246
[   69.168655] RAX: 000000000003bffc RBX: 00000000ffffffff RCX: 0000000000000002
[   69.169000] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88001dffe000
[   69.169346] RBP: ffff88081c0ebcb8 R08: 0000000000000000 R09: ffff880000030840
[   69.169691] R10: 0000000000100000 R11: 0000000000000000 R12: 0000000000000000
[   69.170036] R13: 00000000ffffffff R14: 0000000000000001 R15: 0000000000200000
[   69.170382] FS:  00007fb8de189720(0000) GS:ffff88001de00000(0000) knlGS:0000000000000000
[   69.170794] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   69.171094] CR2: 00007fb8dd59237c CR3: 000000081a790000 CR4: 00000000000006f0
[   69.171439] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   69.171784] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   69.172130] Process smartctl (pid: 1206, threadinfo ffff88081c0ea000, task ffff88081a760000)
[   69.194513] Stack:
[   69.205788]  0000000000000034 00000002817e3390 0000000000000000 ffff88081c0ebe00
[   69.217739] <0> 0000000000000000 000000000003bffc 0000000000000000 0000000000000000
[   69.241250] <0> 0000000000000000 00000000ffffffff ffff88081c5b4080 ffff88081c0ebe00
[   69.277310] Call Trace:
[   69.289278]  [<ffffffff812c52ac>] swiotlb_alloc_coherent+0xec/0x130
[   69.301118]  [<ffffffff81038b31>] x86_swiotlb_alloc_coherent+0x61/0x70
[   69.313045]  [<ffffffffa002d0ce>] megasas_mgmt_fw_ioctl+0x1ae/0x690 [megaraid_sas]
[   69.336399]  [<ffffffffa002d748>] megasas_mgmt_ioctl_fw+0x198/0x240 [megaraid_sas]
[   69.359346]  [<ffffffffa002f695>] megasas_mgmt_ioctl+0x35/0x50 [megaraid_sas]
[   69.370902]  [<ffffffff81153b12>] vfs_ioctl+0x22/0xa0
[   69.382322]  [<ffffffff8115da2a>] ? alloc_fd+0x10a/0x150
[   69.393622]  [<ffffffff81153cb1>] do_vfs_ioctl+0x81/0x410
[   69.404696]  [<ffffffff8155cc13>] ? do_page_fault+0x153/0x3b0
[   69.415761]  [<ffffffff811540c1>] sys_ioctl+0x81/0xa0
[   69.426640]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
[   69.437491] Code: fe ff ff 48 8b 3d 74 38 76 00 41 bf 00 00 20 00 e8 51 f5 d7 ff 83 e0 ff 48 05 ff 07 00 00 48 c1 e8 0b 48 89 45 c8 e9 13 fe ff ff <0f> 0b eb fe 0f 1f 80 00 00 00 00 55 48 89 e5 48 83 ec 20 4c 89
[   69.478216] RIP  [<ffffffff812c4dc5>] map_single+0x255/0x260
[   69.489668]  RSP <ffff88081c0ebc58>
[   69.500975] ---[ end trace 6a2181b634e2abc7 ]---

Reported-by: Bokhan Artem <aptem@ngs.ru>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Cc: stable@kernel.org
---
 drivers/scsi/megaraid/megaraid_sas.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas.c b/drivers/scsi/megaraid/megaraid_sas.c
index eb29d50..72713c5 100644
--- a/drivers/scsi/megaraid/megaraid_sas.c
+++ b/drivers/scsi/megaraid/megaraid_sas.c
@@ -4359,6 +4359,11 @@ megasas_mgmt_fw_ioctl(struct megasas_instance *instance,
 	 * For each user buffer, create a mirror buffer and copy in
 	 */
 	for (i = 0; i < ioc->sge_count; i++) {
+		if (ioc->sgl[i].iov_len == 0) {
+			kbuff_arr[i] = NULL;
+			continue;
+		}
+
 		kbuff_arr[i] = dma_alloc_coherent(&instance->pdev->dev,
 						    ioc->sgl[i].iov_len,
 						    &buf_handle, GFP_KERNEL);
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH] [SCSI] megaraid_sas: Sanity check user supplied length before passing it to dma_alloc_coherent()
  2010-11-10 17:21   ` Bokhan Artem
@ 2010-11-11 18:02     ` Bjørn Mork
  0 siblings, 0 replies; 7+ messages in thread
From: Bjørn Mork @ 2010-11-11 18:02 UTC (permalink / raw)
  To: linux-scsi, megaraidlinux
  Cc: Bokhan Artem, linux-ide, Bjørn Mork, stable

The ioc->sgl[i].iov_len value is supplied by the ioctl caller, and can be
zero in some cases.  Assume that's valid and continue without error.

Fixes:

[   69.162538] ------------[ cut here ]------------
[   69.162806] kernel BUG at /build/buildd/linux-2.6.32/lib/swiotlb.c:368!
[   69.163134] invalid opcode: 0000 [#1] SMP
[   69.163570] last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
[   69.163975] CPU 0
[   69.164227] Modules linked in: fbcon tileblit font bitblit softcursor vga16fb vgastate ioatdma radeon ttm drm_kms_helper shpchp drm i2c_algo_bit lp parport floppy pata_jmicron megaraid_sas igb dca
[   69.167419] Pid: 1206, comm: smartctl Tainted: G        W  2.6.32-25-server #45-Ubuntu X8DTN
[   69.167843] RIP: 0010:[<ffffffff812c4dc5>]  [<ffffffff812c4dc5>] map_single+0x255/0x260
[   69.168370] RSP: 0018:ffff88081c0ebc58  EFLAGS: 00010246
[   69.168655] RAX: 000000000003bffc RBX: 00000000ffffffff RCX: 0000000000000002
[   69.169000] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88001dffe000
[   69.169346] RBP: ffff88081c0ebcb8 R08: 0000000000000000 R09: ffff880000030840
[   69.169691] R10: 0000000000100000 R11: 0000000000000000 R12: 0000000000000000
[   69.170036] R13: 00000000ffffffff R14: 0000000000000001 R15: 0000000000200000
[   69.170382] FS:  00007fb8de189720(0000) GS:ffff88001de00000(0000) knlGS:0000000000000000
[   69.170794] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   69.171094] CR2: 00007fb8dd59237c CR3: 000000081a790000 CR4: 00000000000006f0
[   69.171439] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   69.171784] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   69.172130] Process smartctl (pid: 1206, threadinfo ffff88081c0ea000, task ffff88081a760000)
[   69.194513] Stack:
[   69.205788]  0000000000000034 00000002817e3390 0000000000000000 ffff88081c0ebe00
[   69.217739] <0> 0000000000000000 000000000003bffc 0000000000000000 0000000000000000
[   69.241250] <0> 0000000000000000 00000000ffffffff ffff88081c5b4080 ffff88081c0ebe00
[   69.277310] Call Trace:
[   69.289278]  [<ffffffff812c52ac>] swiotlb_alloc_coherent+0xec/0x130
[   69.301118]  [<ffffffff81038b31>] x86_swiotlb_alloc_coherent+0x61/0x70
[   69.313045]  [<ffffffffa002d0ce>] megasas_mgmt_fw_ioctl+0x1ae/0x690 [megaraid_sas]
[   69.336399]  [<ffffffffa002d748>] megasas_mgmt_ioctl_fw+0x198/0x240 [megaraid_sas]
[   69.359346]  [<ffffffffa002f695>] megasas_mgmt_ioctl+0x35/0x50 [megaraid_sas]
[   69.370902]  [<ffffffff81153b12>] vfs_ioctl+0x22/0xa0
[   69.382322]  [<ffffffff8115da2a>] ? alloc_fd+0x10a/0x150
[   69.393622]  [<ffffffff81153cb1>] do_vfs_ioctl+0x81/0x410
[   69.404696]  [<ffffffff8155cc13>] ? do_page_fault+0x153/0x3b0
[   69.415761]  [<ffffffff811540c1>] sys_ioctl+0x81/0xa0
[   69.426640]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
[   69.437491] Code: fe ff ff 48 8b 3d 74 38 76 00 41 bf 00 00 20 00 e8 51 f5 d7 ff 83 e0 ff 48 05 ff 07 00 00 48 c1 e8 0b 48 89 45 c8 e9 13 fe ff ff <0f> 0b eb fe 0f 1f 80 00 00 00 00 55 48 89 e5 48 83 ec 20 4c 89
[   69.478216] RIP  [<ffffffff812c4dc5>] map_single+0x255/0x260
[   69.489668]  RSP <ffff88081c0ebc58>
[   69.500975] ---[ end trace 6a2181b634e2abc7 ]---

Reported-by: Bokhan Artem <aptem@ngs.ru>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Cc: stable@kernel.org
---
 drivers/scsi/megaraid/megaraid_sas.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas.c b/drivers/scsi/megaraid/megaraid_sas.c
index eb29d50..72713c5 100644
--- a/drivers/scsi/megaraid/megaraid_sas.c
+++ b/drivers/scsi/megaraid/megaraid_sas.c
@@ -4359,6 +4359,11 @@ megasas_mgmt_fw_ioctl(struct megasas_instance *instance,
 	 * For each user buffer, create a mirror buffer and copy in
 	 */
 	for (i = 0; i < ioc->sge_count; i++) {
+		if (ioc->sgl[i].iov_len == 0) {
+			kbuff_arr[i] = NULL;
+			continue;
+		}
+
 		kbuff_arr[i] = dma_alloc_coherent(&instance->pdev->dev,
 						    ioc->sgl[i].iov_len,
 						    &buf_handle, GFP_KERNEL);
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: kernel problems with smart on LSI-92xx
  2010-11-11 14:06 ` Bjørn Mork
  2010-11-10 17:21   ` Bokhan Artem
@ 2010-11-26 15:42   ` Artem Bokhan
  2010-11-29  7:09     ` Bjørn Mork
  1 sibling, 1 reply; 7+ messages in thread
From: Artem Bokhan @ 2010-11-26 15:42 UTC (permalink / raw)
  To: Bjørn Mork; +Cc: linux-ide

   It appears that passing self-test command to SATA disk does not work too. 
Although nobody reports any error, self-test actually just does not start.

smartctl -d megaraid,10 -t long /dev/sg0

smartctl 5.40 2010-10-16 r3189 [x86_64-unknown-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

/dev/sg0 [megaraid_disk_10] [SAT]: Device open changed type from 'megaraid' to 'sat'
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in 
off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line 
mode" successful.
Testing has begun.
Please wait 255 minutes for test to complete.
Test will complete after Sat Nov 27 01:54:44 2010

Use smartctl -X to abort test.


11.11.2010 20:06, Bjørn Mork пишет:
> Bokhan Artem<aptem@ngs.ru>  writes:
>
>> Hello.
>>
>> I have kernel problems when trying to run smart commands on LSI-92xx
>> controller.
>>
>> Running self-test  on SAS disk (smartctl /dev/sda -dmegaraid,24 -t
>> long) with smartmontools causes kernel oops (?) (and segfault). Look
>> in attachment for dmesg.
>>
>> strace of smartctl:
>>
>> mknod("/dev/megaraid_sas_ioctl_node", S_IFCHR, makedev(251, 0)) = -1 EEXIST
>> (File exists)
>> close(4)                                = 0
>> munmap(0x7f4be3a9f000, 4096)            = 0
>> open("/dev/megaraid_sas_ioctl_node", O_RDWR) = 4
>> ioctl(4, MTRRIOC_SET_ENTRY, 0x7fffa574ed30) = 0
>> ioctl(4, MTRRIOC_SET_ENTRY, 0x7fffa574eb30) = 0
>> ioctl(4, MTRRIOC_SET_ENTRY<unfinished ...>
>> +++ killed by SIGSEGV +++
>>
>>
>> Viewing  smart info is OK (smartctl /dev/sda -dmegaraid,24 -a).
>> Running self-test on SATA disk on the same system is OK.
>>
>> The problem is reproducible with 2.6.32 and 2.6.36 kernels.
> A quick look at this reveals that smartctl will happily do a
> MEGASAS_IOC_FIRMWARE ioctl with sge_count = 1 and sgl[0].iov_len = 0 if
> it is sending a command with dataLen == 0 :
>
>
> /* Issue passthrough scsi command to PERC5/6 controllers */
> bool linux_megaraid_device::megasas_cmd(int cdbLen, void *cdb,
>    int dataLen, void *data,
>    int /*senseLen*/, void * /*sense*/, int /*report*/)
> {
>    struct megasas_pthru_frame    *pthru;
>    struct megasas_iocpacket      uio;
>    struct megasas_iocpacket      uio;
>    int rc;
>
>    memset(&uio, 0, sizeof(uio));
>    pthru = (struct megasas_pthru_frame *)uio.frame.raw;
>    pthru->cmd = MFI_CMD_PD_SCSI_IO;
>    int rc;
>
>    memset(&uio, 0, sizeof(uio));
>    pthru = (struct megasas_pthru_frame *)uio.frame.raw;
>    pthru->cmd = MFI_CMD_PD_SCSI_IO;
>    pthru->cmd_status = 0xFF;
>    pthru->scsi_status = 0x0;
>    pthru->target_id = m_disknum;
>    pthru->lun = 0;
>    pthru->cdb_len = cdbLen;
>    pthru->timeout = 0;
>    pthru->flags = MFI_FRAME_DIR_READ;
>    pthru->sge_count = 1;
>    pthru->data_xfer_len = dataLen;
>    pthru->sgl.sge32[0].phys_addr = (intptr_t)data;
>    pthru->sgl.sge32[0].length = (uint32_t)dataLen;
>    memcpy(pthru->cdb, cdb, cdbLen);
>
>    uio.host_no = m_hba;
>    uio.sge_count = 1;
>    uio.sgl_off = offsetof(struct megasas_pthru_frame, sgl);
>    uio.sgl[0].iov_base = data;
>    uio.sgl[0].iov_len = dataLen;
>
>    rc = 0;
>    errno = 0;
>    rc = ioctl(m_fd, MEGASAS_IOC_FIRMWARE,&uio);
>    if (pthru->cmd_status || rc != 0) {
>      if (pthru->cmd_status == 12) {
>        return set_err(EIO, "megasas_cmd: Device %d does not exist\n", m_disknum);
>      }
>      return set_err((errno ? errno : EIO), "megasas_cmd result: %d.%d = %d/%d",
>                     m_hba, m_disknum, errno,
>                     pthru->cmd_status);
>    }
>    return true;
> }
>
>
>
> The kernel bug is that the zero valued sgl[0].iov_len is passed
> unmodified to megasas_mgmt_fw_ioctl() which again passes it on as size
> to dma_alloc_coherent():
>
>          /*
>           * For each user buffer, create a mirror buffer and copy in
>           */
>          for (i = 0; i<  ioc->sge_count; i++) {
>                  kbuff_arr[i] = dma_alloc_coherent(&instance->pdev->dev,
>                                                      ioc->sgl[i].iov_len,
>                                                      &buf_handle, GFP_KERNEL);
>
>
>
> And it looks like most (all?) of the dma_alloc_coherent()
> implementations will use get_order(size) to compute the necessary
> allocation. This will fail if size == 0.
>
> On the other hand, I may have misunderstood this entirely....
>
> But if you dare, you could try the attached patch (compile tested only
> as I don't have the hardware) and see if it helps.  Let me know how it
> goes, and I'll forward it to the megaraid manitainers if it really fixes
> your problem.
>
>
>
>
> Bjørn
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kernel problems with smart on LSI-92xx
  2010-11-26 15:42   ` kernel problems with smart on LSI-92xx Artem Bokhan
@ 2010-11-29  7:09     ` Bjørn Mork
  2010-11-29  9:49       ` Artem Bokhan
  0 siblings, 1 reply; 7+ messages in thread
From: Bjørn Mork @ 2010-11-29  7:09 UTC (permalink / raw)
  To: Artem Bokhan; +Cc: linux-ide

Artem Bokhan <aptem@ngs.ru> writes:

>   It appears that passing self-test command to SATA disk does not work
> too. Although nobody reports any error, self-test actually just does
> not start.
>
> smartctl -d megaraid,10 -t long /dev/sg0
>
> smartctl 5.40 2010-10-16 r3189 [x86_64-unknown-linux-gnu] (local build)
> Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
>
> /dev/sg0 [megaraid_disk_10] [SAT]: Device open changed type from 'megaraid' to 'sat'
> === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
> Sending command: "Execute SMART Extended self-test routine immediately
> in off-line mode".
> Drive command "Execute SMART Extended self-test routine immediately in
> off-line mode" successful.
> Testing has begun.
> Please wait 255 minutes for test to complete.
> Test will complete after Sat Nov 27 01:54:44 2010
>
> Use smartctl -X to abort test.

I'm not sure I understand you correctly.  The above output looks
normal.  You mean that "smartctl -l selftest" doesn't show any test in
progress or finished? 


Bjørn

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kernel problems with smart on LSI-92xx
  2010-11-29  7:09     ` Bjørn Mork
@ 2010-11-29  9:49       ` Artem Bokhan
  0 siblings, 0 replies; 7+ messages in thread
From: Artem Bokhan @ 2010-11-29  9:49 UTC (permalink / raw)
  To: Bjørn Mork; +Cc: linux-ide

  29.11.2010 13:09, Bjørn Mork пишет:
> You mean that "smartctl -l selftest" doesn't show any test in
> progress or finished?
Yes, self-test does not start.


>
> Bjørn
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-11-29  9:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-09 19:27 kernel problems with smart on LSI-92xx Bokhan Artem
2010-11-11 14:06 ` Bjørn Mork
2010-11-10 17:21   ` Bokhan Artem
2010-11-11 18:02     ` [PATCH] [SCSI] megaraid_sas: Sanity check user supplied length before passing it to dma_alloc_coherent() Bjørn Mork
2010-11-26 15:42   ` kernel problems with smart on LSI-92xx Artem Bokhan
2010-11-29  7:09     ` Bjørn Mork
2010-11-29  9:49       ` Artem Bokhan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).