linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Bug 112241] New: Under heavy load FC TARGET going to Oops
@ 2016-02-10  7:32 bugzilla-daemon
  2016-02-29  2:24 ` [Bug 112241] " bugzilla-daemon
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: bugzilla-daemon @ 2016-02-10  7:32 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=112241

            Bug ID: 112241
           Summary: Under heavy load FC TARGET going to Oops
           Product: SCSI Drivers
           Version: 2.5
    Kernel Version: 4.3.3
          Hardware: Intel
                OS: Linux
              Tree: Fedora
            Status: NEW
          Severity: high
          Priority: P1
         Component: QLOGIC QLA2XXX
          Assignee: scsi_drivers-qla2xxx@kernel-bugs.osdl.org
          Reporter: anthony.bloodoff@gmail.com
        Regression: No

Created attachment 203261
  --> https://bugzilla.kernel.org/attachment.cgi?id=203261&action=edit
Kernel stacktrace

Storage on Linux Fedora with QLogic Corp. ISP2532-based 8Gb Fibre Channel to
PCI Express HBA exporting Adaptec RAID6 with bcache on Intel SSD to VMWARE 5
On heavy load (for example VM migrating from storage) system going to Oops.


Stacktrace attached

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug 112241] Under heavy load FC TARGET going to Oops
  2016-02-10  7:32 [Bug 112241] New: Under heavy load FC TARGET going to Oops bugzilla-daemon
@ 2016-02-29  2:24 ` bugzilla-daemon
  2016-02-29  2:26 ` bugzilla-daemon
  2016-03-01  5:16 ` bugzilla-daemon
  2 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2016-02-29  2:24 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=112241

--- Comment #1 from Anthony <anthony.bloodoff@gmail.com> ---
Created attachment 206351
  --> https://bugzilla.kernel.org/attachment.cgi?id=206351&action=edit
Screenshot with call trace for kernel 4.5.0

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug 112241] Under heavy load FC TARGET going to Oops
  2016-02-10  7:32 [Bug 112241] New: Under heavy load FC TARGET going to Oops bugzilla-daemon
  2016-02-29  2:24 ` [Bug 112241] " bugzilla-daemon
@ 2016-02-29  2:26 ` bugzilla-daemon
  2016-03-01  5:16   ` Nicholas A. Bellinger
  2016-03-01  5:16 ` bugzilla-daemon
  2 siblings, 1 reply; 5+ messages in thread
From: bugzilla-daemon @ 2016-02-29  2:26 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=112241

Anthony <anthony.bloodoff@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Kernel Version|4.3.3                       |4.5.0

--- Comment #2 from Anthony <anthony.bloodoff@gmail.com> ---
With kernel 4.5.0 on target, system hang after clients connects to target.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Bug 112241] Under heavy load FC TARGET going to Oops
  2016-02-29  2:26 ` bugzilla-daemon
@ 2016-03-01  5:16   ` Nicholas A. Bellinger
  0 siblings, 0 replies; 5+ messages in thread
From: Nicholas A. Bellinger @ 2016-03-01  5:16 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: linux-scsi, target-devel

Hi Anthony,

On Mon, 2016-02-29 at 02:26 +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=112241
> 
> Anthony <anthony.bloodoff@gmail.com> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>      Kernel Version|4.3.3                       |4.5.0
> 
> --- Comment #2 from Anthony <anthony.bloodoff@gmail.com> ---
> With kernel 4.5.0 on target, system hang after clients connects to target.
> 

So there are two things going on here.

First, the BUG_ON your ESX <-> LIO FC setup triggered has been addressed
recently in v4.5-rc4 and later kernels with the following series:

http://www.spinics.net/lists/target-devel/msg11822.html

Note these patches will be making it back to earlier stable kernels over
the next weeks.

However, this specific bug is a final consequence of larger ESX v5.5u2+
host side issue of AtomicTestandSet (ATS) heartbeat being enabled (by
default) for all VMFS5 mounts:

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2113956

Other folks have been hitting this recently, here's some extra
background:

http://permalink.gmane.org/gmane.linux.scsi.target.devel/11574
http://www.spinics.net/lists/target-devel/msg12124.html

Note this effects all targets w/ VAAI ATS (including EMC, IBM, 3PAR,
SolidFire, etc) and the current solution for ESX v5.5u2+ is to either:

  - Explicitly disable ATS heartbeat usage on all VMFS5 mounts as 
    described in the VMWare -kb article, or:
  - Explicitly disable all ATS logic completely from LIO using 
    emulate_caw=0 on all backends connected to ESX v5.5u2+ hosts
    with VMFS5.

You can google for 'esx ats heartbeat bug' to see the gory details.

Thanks for reporting!

--nab

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug 112241] Under heavy load FC TARGET going to Oops
  2016-02-10  7:32 [Bug 112241] New: Under heavy load FC TARGET going to Oops bugzilla-daemon
  2016-02-29  2:24 ` [Bug 112241] " bugzilla-daemon
  2016-02-29  2:26 ` bugzilla-daemon
@ 2016-03-01  5:16 ` bugzilla-daemon
  2 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2016-03-01  5:16 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=112241

--- Comment #3 from nab <nab@linux-iscsi.org> ---
Hi Anthony,

On Mon, 2016-02-29 at 02:26 +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=112241
> 
> Anthony <anthony.bloodoff@gmail.com> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>      Kernel Version|4.3.3                       |4.5.0
> 
> --- Comment #2 from Anthony <anthony.bloodoff@gmail.com> ---
> With kernel 4.5.0 on target, system hang after clients connects to target.
> 

So there are two things going on here.

First, the BUG_ON your ESX <-> LIO FC setup triggered has been addressed
recently in v4.5-rc4 and later kernels with the following series:

http://www.spinics.net/lists/target-devel/msg11822.html

Note these patches will be making it back to earlier stable kernels over
the next weeks.

However, this specific bug is a final consequence of larger ESX v5.5u2+
host side issue of AtomicTestandSet (ATS) heartbeat being enabled (by
default) for all VMFS5 mounts:

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2113956

Other folks have been hitting this recently, here's some extra
background:

http://permalink.gmane.org/gmane.linux.scsi.target.devel/11574
http://www.spinics.net/lists/target-devel/msg12124.html

Note this effects all targets w/ VAAI ATS (including EMC, IBM, 3PAR,
SolidFire, etc) and the current solution for ESX v5.5u2+ is to either:

  - Explicitly disable ATS heartbeat usage on all VMFS5 mounts as 
    described in the VMWare -kb article, or:
  - Explicitly disable all ATS logic completely from LIO using 
    emulate_caw=0 on all backends connected to ESX v5.5u2+ hosts
    with VMFS5.

You can google for 'esx ats heartbeat bug' to see the gory details.

Thanks for reporting!

--nab

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-03-01  5:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-10  7:32 [Bug 112241] New: Under heavy load FC TARGET going to Oops bugzilla-daemon
2016-02-29  2:24 ` [Bug 112241] " bugzilla-daemon
2016-02-29  2:26 ` bugzilla-daemon
2016-03-01  5:16   ` Nicholas A. Bellinger
2016-03-01  5:16 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).