Linux RDMA and InfiniBand development

Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed

* Re: [PATCH V2 13/22] bnxt_re: Support QP verbs
From: Selvin Xavier @ 2016-12-13  6:08 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Eddie Wai, Devesh Sharma,
	Somnath Kotur, Sriharsha Basavapatna
In-Reply-To: <20161212182737.GC8204-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>

On Mon, Dec 12, 2016 at 11:57 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> It can help to review if you break this function into smaller pieces and
> get rid of switch->switch->if construction.

Thanks Leon. I will address this and your previous comments in v3 patch set.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH V2 00/22] Broadcom RoCE Driver (bnxt_re)
From: Selvin Xavier @ 2016-12-13  6:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161212170701.GA28387-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

On Mon, Dec 12, 2016 at 10:37 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Sat, Dec 10, 2016 at 11:06:58AM +0530, Selvin Xavier wrote:
>> On Fri, Dec 9, 2016 at 12:17 PM, Selvin Xavier
>> <selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org> wrote:
>> > I am preparing a git repository with these changes as per Jason's
>> > comment and will share the details later today.
>>
>> Please use bnxt_re branch in this git repository.
>>
>> https://github.com/Broadcom/linux-rdma-nxt.git
>
> Why are you using __packed in bnxt_re_uverbs_abi.h ? that doesn't seem
> necessary. It is a good idea to make sure all those structures are a
> multiple of 64 bits (add explicit reserved fields), and make sure you
> test 32 bit verbs as well.

Will take care in v3.

>
> Why are you using debugfs just to export counters? Isn't the core code
> counter framework good enough?

I agree that some of the counters exported by this patch set, tx and
rx bytes/pkts etc, can be exported
through the core counters. i will try adding  this support in v3, if
not, will post as a separate patch.
debugfs was introduced more for the future, in case any HW specific
data needs to be displayed.
As of now, it tracks only the count of resources( CQ/MR/QPs) active at
any given point. So its ok to
skip this patch from this series.

>
> Please try and avoid writing functions as defines (eg rdev_to_dev,
> to_bnxt_re, SQE_PG, RCFW_CMDQ_COOKIE, PTR_PG etc)
>
Sure, will take care in v3.

> There is something wrong with the tabs and spaces (see
> https://github.com/Broadcom/linux-rdma-nxt/blob/03e23b087f7e86ea28656273994e065827210ce5/drivers/infiniband/hw/bnxtre/bnxt_re_hsi.h)
>
> FWIW, I really dislike the column alignment style, it is so hard to
> maintain..
This file contains the Macro defines for the FW/HW structures and are
auto-generated. Some of these auto-generated defines are very long
which makes the lines greater than
80 characters. I will fix whatever possible and include in v3 set.

>
> Jason

Thanks,
Selvin
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* DO YOU NEED A LOAN??
From: bancoleite-VvPEbV8fm2ydXbR9Wn5p+EWldV4+N/cJ @ 2016-12-13  5:27 UTC (permalink / raw)
  To: Recipients

Are you in need of a loan? Apply for more details.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH V2 00/22] Broadcom RoCE Driver (bnxt_re)
From: Selvin Xavier @ 2016-12-13  4:52 UTC (permalink / raw)
  To: jtoppins-H+wXaHxf7aLQT0dZR+AlfA
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <9cf03e2b-a16d-19ec-a8ce-14f24272bf6a-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Mon, Dec 12, 2016 at 10:24 PM, Jonathan Toppins <jtoppins-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> CHECK   drivers/infiniband/hw/bnxtre/bnxt_re_debugfs.c
>   CHECK   drivers/infiniband/hw/bnxtre/bnxt_qplib_res.c
> drivers/infiniband/hw/bnxtre/bnxt_qplib_res.c:729:6: warning: symbol
> 'bnxt_qplib_cleanup_pkey_tbl' was not declared. Should it be static?
I will remove this warning in v3 patch set.

>   CHECK   drivers/infiniband/hw/bnxtre/bnxt_qplib_rcfw.c
>   CHECK   drivers/infiniband/hw/bnxtre/bnxt_qplib_sp.c
>   CHECK   drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c
> drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1015:22: warning: context
> imbalance in 'bnxt_qplib_lock_cqs' - wrong count at exit
> drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1030:28: warning: context
> imbalance in 'bnxt_qplib_unlock_cqs' - unexpected unlock
The above two are false positives, since locking and unlocking are
handled in two separate functions. This is a wrapper to lock/unlock
both SQ and RQ CQ locks. Functionally it is ok  since
bnxt_qplib_unlock_cqs is called just after the critical section and
both locks are freed in order. I think we can ignore this warning.


>   MODPOST 2 modules
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH V2 for-next 00/11] Code improvements & fixes for HNS RoCE driver
From: Doug Ledford @ 2016-12-13  4:34 UTC (permalink / raw)
  To: Salil Mehta
  Cc: xavier.huwei-hv44wF8Li93QT0dZR+AlfA,
	oulijun-hv44wF8Li93QT0dZR+AlfA, xushaobo2-hv44wF8Li93QT0dZR+AlfA,
	mehta.salil.lnk-Re5JQEeQqe8AvxtiuMwx3w, lijun_nudt-9Onoh4P/yGk,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <20161115181053.399568-1-salil.mehta-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 1959 bytes --]

On 11/15/2016 1:10 PM, Salil Mehta wrote:
> This patchset introduces some code improvements and fixes
> for the identified problems in the HNS RoCE driver.
> 
> Lijun Ou (4):
>   IB/hns: Add the interface for querying QP1
>   IB/hns: add self loopback for CM
>   IB/hns: Modify the condition of notifying hardware loopback
>   IB/hns: Fix the bug for qp state in hns_roce_v1_m_qp()
> 
> Salil Mehta (1):
>   IB/hns: Fix for Checkpatch.pl comment style errors
> 
> Shaobo Xu (1):
>   IB/hns: Implement the add_gid/del_gid and optimize the GIDs
>     management
> 
> Wei Hu (Xavier) (5):
>   IB/hns: Add code for refreshing CQ CI using TPTR
>   IB/hns: Optimize the logic of allocating memory using APIs
>   IB/hns: Modify the macro for the timeout when cmd process
>   IB/hns: Modify query info named port_num when querying RC QP
>   IB/hns: Change qpn allocation to round-robin mode.
> 
>  drivers/infiniband/hw/hns/hns_roce_alloc.c  |   11 +-
>  drivers/infiniband/hw/hns/hns_roce_cmd.c    |    8 +-
>  drivers/infiniband/hw/hns/hns_roce_cmd.h    |    7 +-
>  drivers/infiniband/hw/hns/hns_roce_common.h |    2 -
>  drivers/infiniband/hw/hns/hns_roce_cq.c     |   17 +-
>  drivers/infiniband/hw/hns/hns_roce_device.h |   45 ++--
>  drivers/infiniband/hw/hns/hns_roce_eq.c     |    6 +-
>  drivers/infiniband/hw/hns/hns_roce_hem.c    |    6 +-
>  drivers/infiniband/hw/hns/hns_roce_hw_v1.c  |  267 +++++++++++++++++------
>  drivers/infiniband/hw/hns/hns_roce_hw_v1.h  |   17 +-
>  drivers/infiniband/hw/hns/hns_roce_main.c   |  311 +++++++--------------------
>  drivers/infiniband/hw/hns/hns_roce_mr.c     |   21 +-
>  drivers/infiniband/hw/hns/hns_roce_pd.c     |    5 +-
>  drivers/infiniband/hw/hns/hns_roce_qp.c     |    2 +-
>  14 files changed, 363 insertions(+), 362 deletions(-)
> 

Series applied, thanks.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG Key ID: 0E572FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply

* Re: SR-IOV with mlx4 on ConnectX-2 fails with DMAR errors
From: Joshua McBeth @ 2016-12-13  3:57 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <CAN27Ff4RYh3y_45PUxxXhGuDvrrrjm8qe38fj5JPq7oV2QmdYA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Regarding this issue, I discovered earlier kernels were working with
SR-IOV enabled and bisected to the commit
4be90bc60df47f6268b594c4fb6c90f0ff2f519f ("IB/mad: Remove
ib_get_dma_mr calls").

However, this commit seems to be part of a larger refactoring patch
set and reverting just this commit does not resolve the DMAR errors
nor enable the 'ib0' interface to function.

On Mon, Dec 12, 2016 at 11:04 AM, Joshua McBeth <joshua.mcbeth-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> I am having some issues getting SR-IOV working with a Mellanox
> ConnectX-2 in a Supermicro X8DTH-6F.
>
> The Infiniband adapter has been flashed with the latest compatible
> firmware with SR-IOV enabled and SR-IOV/virtualization is enabled in
> the BIOS and working for other hardware (with 2 gigabit ethernet, 1
> wireless ethernet, 1 nVidia GPU passed through to qemu/kvm guests).
> The Infiniband adapter functions as expected if SR-IOV is not enabled
> in the driver.
>
> When I enable SR-IOV in the mlx4 driver ( mlx4_core.port_type_array=1
> mlx4_core.num_vfs=8 mlx4_core.probe_vf=0 ), the ib0 interface does not
> function.  The link is never reported as up in dmesg or ibstat, but a
> DMAR error is reported around when the link would be expected to come
> up.  Attempts to use ibping result in additional DMAR errors and no
> responses are received by the ibping application.  The DMAR errors are
> reported for a different bus address than the iommu seems to be
> getting configured for so I am thinking this is a driver error.
>
> I have excerpted the issue below.
>
> Here the devices added to the iommu are 0000:05:0x.x - 0000:05:01.0
>
> ------ dmesg excerpts
>
> [   44.410799] mlx4_core 0000:05:00.0: Enabling SR-IOV with 8 VFs
> [   44.512772] pci 0000:05:00.1: [15b3:1002] type 00 class 0x0c0600
> [   44.513052] pci 0000:05:00.1: Max Payload Size set to 256 (was 128, max 256)
> [   44.513520] iommu: Adding device 0000:05:00.1 to group 44
> [   44.513722] mlx4_core: Initializing 0000:05:00.1
> [   44.513891] mlx4_core 0000:05:00.1: enabling device (0000 -> 0002)
> [   44.514081] mlx4_core 0000:05:00.1: Skipping virtual function:1
> [   44.514332] pci 0000:05:00.2: [15b3:1002] type 00 class 0x0c0600
> [   44.514604] pci 0000:05:00.2: Max Payload Size set to 256 (was 128, max 256)
> [   44.515047] iommu: Adding device 0000:05:00.2 to group 45
> [   44.515225] mlx4_core: Initializing 0000:05:00.2
> [   44.515388] mlx4_core 0000:05:00.2: enabling device (0000 -> 0002)
> [   44.515572] mlx4_core 0000:05:00.2: Skipping virtual function:2
> ...
> [   44.523297] pci 0000:05:01.0: [15b3:1002] type 00 class 0x0c0600
> [   44.523570] pci 0000:05:01.0: Max Payload Size set to 256 (was 128, max 256)
> [   44.524007] iommu: Adding device 0000:05:01.0 to group 51
> [   44.524194] mlx4_core: Initializing 0000:05:01.0
> [   44.524363] mlx4_core 0000:05:01.0: enabling device (0000 -> 0002)
> [   44.524554] mlx4_core 0000:05:01.0: Skipping virtual function:8
> [   44.524746] mlx4_core 0000:05:00.0: Running in master mode
> [   46.867330] mlx4_core 0000:05:00.0: PCIe link speed is 5.0GT/s,
> device supports 5.0GT/s
> [   46.867613] mlx4_core 0000:05:00.0: PCIe link width is x8, device supports x8
> [   46.910736] mlx4_core: Initializing 0000:05:00.1
> [   46.910913] mlx4_core 0000:05:00.1: enabling device (0000 -> 0002)
> [   46.911102] mlx4_core 0000:05:00.1: Skipping virtual function:1
> ...
> [   46.915085] mlx4_core: Initializing 0000:05:01.0
> [   46.915257] mlx4_core 0000:05:01.0: enabling device (0000 -> 0002)
> [   46.915440] mlx4_core 0000:05:01.0: Skipping virtual function:8
>
> ---
>
> Interface is brought up here by init scripts and what I assume is the
> link state notification seems to be eaten by iommu
>
> The adapter seems to now have the bus address [0000:]05:06.1?
>
> ---
>
>
> [   71.631199] DMAR: DRHD: handling fault status reg 2
> [   71.631204] DMAR: [DMA Read] Request device [05:06.1] fault addr
> c2652b000 [fault reason 02] Present bit in context entry is clear
> [   72.020267] ib0: enabling connected mode will cause multicast packet drops
> [   72.020307] ib0: mtu > 2044 will cause multicast packet drops.
>
> ------
>
> Here I attempt to ibping another node 3 times and each packet results
> in a DMAR error, again with a different bus address than was added to
> the IOMMU:
>
> ------ dmesg excerpt continues
>
> [  103.134429] DMAR: DRHD: handling fault status reg 102
> [  103.134434] DMAR: [DMA Read] Request device [05:06.1] fault addr
> 81b081000 [fault reason 02] Present bit in context entry is clear
> [  105.135927] DMAR: DRHD: handling fault status reg 202
> [  105.136013] DMAR: [DMA Read] Request device [05:06.1] fault addr
> 81b081000 [fault reason 02] Present bit in context entry is clear
> [  107.137479] DMAR: DRHD: handling fault status reg 302
> [  107.137484] DMAR: [DMA Read] Request device [05:06.1] fault addr
> 81b081000 [fault reason 02] Present bit in context entry is clear
>
> ------ uname -a
>
> Linux cuprum 4.8.1-gentoo #1 SMP Sun Dec 11 00:05:06 UTC 2016 x86_64
> Intel(R) Xeon(R) CPU X5650 @ 2.67GHz GenuineIntel GNU/Linux
>
> ------- lspci excerpt with SR-IOV disabled
>
> 05:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe
> 2.0 5GT/s - IB QDR / 10GigE] (rev b0)
> Subsystem: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s
> - IB QDR / 10GigE]
> Flags: bus master, fast devsel, latency 0, IRQ 49, NUMA node 0
> Memory at fae00000 (64-bit, non-prefetchable) [size=1M]
> Memory at f8800000 (64-bit, prefetchable) [size=8M]
> Capabilities: [40] Power Management version 3
> Capabilities: [48] Vital Product Data
> Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
> Capabilities: [60] Express Endpoint, MSI 00
> Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
> Capabilities: [148] Device Serial Number 00-02-c9-03-00-07-7d-2e
> Capabilities: [108] Single Root I/O Virtualization (SR-IOV)
> Kernel driver in use: mlx4_core
> Kernel modules: mlx4_core
>
> ------ full dmesg is attached
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH V2 00/22] Broadcom RoCE Driver (bnxt_re)
From: Selvin Xavier @ 2016-12-13  3:54 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma, netdev
In-Reply-To: <23e26353-4317-2836-9f94-d1fc3274a770@redhat.com>

On Tue, Dec 13, 2016 at 5:22 AM, Doug Ledford <dledford@redhat.com> wrote:
>
> There are outstanding review comments to be addressed still yet, and the
> v2 patchset doesn't compile for me in 0day testing.  I'm going to bounce
> this one to 4.11.

I will address all review comments and fix the 0day compilation error
and post a v3 soon.

Thanks,
Selvin Xavier

^ permalink raw reply

* Re: iscsi_trx going into D state
From: Robert LeBlanc @ 2016-12-12 23:57 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Zhu Lingshan, linux-rdma, linux-scsi-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <CAANLjFqoHuSq2SsNZ4J2uvAQGPg0F1tpxeJuAQT1oM1hXQ0wew-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Nicholas,

After lots of set backs and having to give up trying to get kernel
dumps on our "production" systems, I've been able to work out the
issues we had with kdump and replicate the issue on my dev boxes. I
have dumps from 4.4.30 and 4.9-rc8 (makedumpfile would not dump, so it
is a straight copy of /proc/vmcore from the crash kernel). In each
crash directory, I put a details.txt file that has the process IDs
that were having problems and a brief description of the set-up at the
time. This was mostly replicated by starting fio and pulling the
Infiniband cable until fio gave up. This hardware also has Mellanox
ConnectX4-LX cards and I also replicated the issue over RoCE using 4.9
since it has the drivers in-box. Please let me know if you need more
info, I can test much faster now. The cores/kernels/modules are
located at [1].

[1] http://mirrors.betterservers.com/trace/crash.tar.xz

Thanks,
Robert
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Fri, Nov 4, 2016 at 3:57 PM, Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org> wrote:
> We hit this yesterday, this time it was on the tx thread (the other
> ones before seem to be on the rx thread). We weren't able to get a
> kernel dump on this. We'll try to get one next time.
>
> # ps axuw | grep "D.*iscs[i]"
> root     12383  0.0  0.0      0     0 ?        D    Nov03   0:04 [iscsi_np]
> root     23016  0.0  0.0      0     0 ?        D    Nov03   0:00 [iscsi_ttx]
> root     23018  0.0  0.0      0     0 ?        D    Nov03   0:00 [iscsi_ttx]
> # cat /proc/12383/stack
> [<ffffffff814f24af>] iscsit_stop_session+0x19f/0x1d0
> [<ffffffff814e3c66>] iscsi_check_for_session_reinstatement+0x1e6/0x270
> [<ffffffff814e6620>] iscsi_target_check_for_existing_instances+0x30/0x40
> [<ffffffff814e6770>] iscsi_target_do_login+0x140/0x640
> [<ffffffff814e7b0c>] iscsi_target_start_negotiation+0x1c/0xb0
> [<ffffffff814e585b>] iscsi_target_login_thread+0xa9b/0xfc0
> [<ffffffff8109d7c8>] kthread+0xd8/0xf0
> [<ffffffff81721a8f>] ret_from_fork+0x3f/0x70
> [<ffffffffffffffff>] 0xffffffffffffffff
> # cat /proc/23016/stack
> [<ffffffff814ce0d9>] target_wait_for_sess_cmds+0x49/0x1a0
> [<ffffffffa058b92b>] isert_wait_conn+0x1ab/0x2f0 [ib_isert]
> [<ffffffff814f2642>] iscsit_close_connection+0x162/0x870
> [<ffffffff814e110f>] iscsit_take_action_for_connection_exit+0x7f/0x100
> [<ffffffff814f122a>] iscsi_target_tx_thread+0x1aa/0x1d0
> [<ffffffff8109d7c8>] kthread+0xd8/0xf0
> [<ffffffff81721a8f>] ret_from_fork+0x3f/0x70
> [<ffffffffffffffff>] 0xffffffffffffffff
> # cat /proc/23018/stack
> [<ffffffff814ce0d9>] target_wait_for_sess_cmds+0x49/0x1a0
> [<ffffffffa058b92b>] isert_wait_conn+0x1ab/0x2f0 [ib_isert]
> [<ffffffff814f2642>] iscsit_close_connection+0x162/0x870
> [<ffffffff814e110f>] iscsit_take_action_for_connection_exit+0x7f/0x100
> [<ffffffff814f122a>] iscsi_target_tx_thread+0x1aa/0x1d0
> [<ffffffff8109d7c8>] kthread+0xd8/0xf0
> [<ffffffff81721a8f>] ret_from_fork+0x3f/0x70
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> From dmesg:
> [  394.476332] INFO: rcu_sched self-detected stall on CPU
> [  394.476334]  20-...: (23976 ticks this GP)
> idle=edd/140000000000001/0 softirq=292/292 fqs=18788
> [  394.476336]   (t=24003 jiffies g=3146 c=3145 q=0)
> [  394.476337] Task dump for CPU 20:
> [  394.476338] kworker/u68:2   R  running task        0 12906      2 0x00000008
> [  394.476345] Workqueue: isert_comp_wq isert_cq_work [ib_isert]
> [  394.476346]  ffff883f2fe38000 00000000f805705e ffff883f7fd03da8
> ffffffff810ac8ff
> [  394.476347]  0000000000000014 ffffffff81adb680 ffff883f7fd03dc0
> ffffffff810af239
> [  394.476348]  0000000000000015 ffff883f7fd03df0 ffffffff810e1cd0
> ffff883f7fd17b80
> [  394.476348] Call Trace:
> [  394.476354]  <IRQ>  [<ffffffff810ac8ff>] sched_show_task+0xaf/0x110
> [  394.476355]  [<ffffffff810af239>] dump_cpu_task+0x39/0x40
> [  394.476357]  [<ffffffff810e1cd0>] rcu_dump_cpu_stacks+0x80/0xb0
> [  394.476359]  [<ffffffff810e6100>] rcu_check_callbacks+0x540/0x820
> [  394.476360]  [<ffffffff810afe11>] ? account_system_time+0x81/0x110
> [  394.476363]  [<ffffffff810faa60>] ? tick_sched_do_timer+0x50/0x50
> [  394.476364]  [<ffffffff810eb599>] update_process_times+0x39/0x60
> [  394.476365]  [<ffffffff810fa815>] tick_sched_handle.isra.17+0x25/0x60
> [  394.476366]  [<ffffffff810faa9d>] tick_sched_timer+0x3d/0x70
> [  394.476368]  [<ffffffff810ec182>] __hrtimer_run_queues+0x102/0x290
> [  394.476369]  [<ffffffff810ec668>] hrtimer_interrupt+0xa8/0x1a0
> [  394.476372]  [<ffffffff81052c65>] local_apic_timer_interrupt+0x35/0x60
> [  394.476374]  [<ffffffff8172423d>] smp_apic_timer_interrupt+0x3d/0x50
> [  394.476376]  [<ffffffff817224f7>] apic_timer_interrupt+0x87/0x90
> [  394.476379]  <EOI>  [<ffffffff810d71be>] ? console_unlock+0x41e/0x4e0
> [  394.476380]  [<ffffffff810d757c>] vprintk_emit+0x2fc/0x500
> [  394.476382]  [<ffffffff810d78ff>] vprintk_default+0x1f/0x30
> [  394.476384]  [<ffffffff81174dde>] printk+0x5d/0x74
> [  394.476388]  [<ffffffff814bce21>] transport_lookup_cmd_lun+0x1d1/0x200
> [  394.476390]  [<ffffffff814ee8c0>] iscsit_setup_scsi_cmd+0x230/0x540
> [  394.476392]  [<ffffffffa058dbf3>] isert_rx_do_work+0x3f3/0x7f0 [ib_isert]
> [  394.476394]  [<ffffffffa058e174>] isert_cq_work+0x184/0x770 [ib_isert]
> [  394.476396]  [<ffffffff8109740f>] process_one_work+0x14f/0x400
> [  394.476397]  [<ffffffff81097c84>] worker_thread+0x114/0x470
> [  394.476398]  [<ffffffff8171d32a>] ? __schedule+0x34a/0x7f0
> [  394.476399]  [<ffffffff81097b70>] ? rescuer_thread+0x310/0x310
> [  394.476400]  [<ffffffff8109d7c8>] kthread+0xd8/0xf0
> [  394.476402]  [<ffffffff8109d6f0>] ? kthread_park+0x60/0x60
> [  394.476403]  [<ffffffff81721a8f>] ret_from_fork+0x3f/0x70
> [  394.476404]  [<ffffffff8109d6f0>] ? kthread_park+0x60/0x60
> [  405.716632] Unexpected ret: -104 send data 360
> [  405.721711] tx_data returned -32, expecting 360.
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Mon, Oct 31, 2016 at 10:34 AM, Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org> wrote:
>> Nicholas,
>>
>> Thanks for following up on this. We have been chasing other bugs in
>> our provisioning and as such has reduced our load on the boxes. We are
>> hoping to get that all straightened out this week and do some more
>> testing. So far we have not had any iSCSI in D state since the patch,
>> be we haven't been able to test it well either. We will keep you
>> updated.
>>
>> Thank you,
>> Robert LeBlanc
>> ----------------
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Sat, Oct 29, 2016 at 4:29 PM, Nicholas A. Bellinger
>> <nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org> wrote:
>>> Hi Robert,
>>>
>>> On Wed, 2016-10-19 at 10:41 -0600, Robert LeBlanc wrote:
>>>> Nicholas,
>>>>
>>>> I didn't have high hopes for the patch because we were not seeing
>>>> TMR_ABORT_TASK (or 'abort') in dmesg or /var/log/messages, but it
>>>> seemed to help regardless. Our clients finally OOMed from the hung
>>>> sessions, so we are having to reboot them and we will do some more
>>>> testing. We haven't put the updated kernel on our clients yet. Our
>>>> clients have iSCSI root disks so I'm not sure if we can get a vmcore
>>>> on those, but we will do what we can to get you a vmcore from the
>>>> target if it happens again.
>>>>
>>>
>>> Just checking in to see if you've observed further issues with
>>> iser-target ports, and/or able to generate a crashdump with v4.4.y..?
>>>
>>>> As far as our configuration: It is a superMicro box with 6 SAMSUNG
>>>> MZ7LM3T8HCJM-00005 SSDs. Two are for root and four are in mdadm
>>>> RAID-10 for exporting via iSCSI/iSER. We have ZFS on top of the
>>>> RAID-10 for checksum and snapshots only and we export ZVols to the
>>>> clients (one or more per VM on the client). We do not persist the
>>>> export info (targetcli saveconfig), but regenerate it from scripts.
>>>> The client receives two or more of these exports and puts them in a
>>>> RAID-1 device. The exports are served by iSER one one port and also by
>>>> normal iSCSI on a different port for compatibility, but not normally
>>>> used. If you need more info about the config, please let me know. It
>>>> was kind of a vague request so I'm not sure what exactly is important
>>>> to you.
>>>
>>> Thanks for the extra details of your hardware + user-space
>>> configuration.
>>>
>>>> Thanks for helping us with this,
>>>> Robert LeBlanc
>>>>
>>>> When we have problems, we usually see this in the logs:
>>>> Oct 17 08:57:50 prv-0-12-sanstack kernel: iSCSI Login timeout on
>>>> Network Portal 0.0.0.0:3260
>>>> Oct 17 08:57:50 prv-0-12-sanstack kernel: Unexpected ret: -104 send data 48
>>>> Oct 17 08:57:50 prv-0-12-sanstack kernel: tx_data returned -32, expecting 48.
>>>> Oct 17 08:57:50 prv-0-12-sanstack kernel: iSCSI Login negotiation failed.
>>>>
>>>> I found some backtraces in the logs, not sure if this is helpful, this
>>>> is before your patch (your patch booted at Oct 18 10:36:59):
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: INFO: rcu_sched
>>>> self-detected stall on CPU
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: #0115-...: (41725 ticks this
>>>> GP) idle=b59/140000000000001/0 softirq=535/535 fqs=30992
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: #011 (t=42006 jiffies g=1550
>>>> c=1549 q=0)
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: Task dump for CPU 5:
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: kworker/u68:2   R  running
>>>> task        0 17967      2 0x00000008
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: Workqueue: isert_comp_wq
>>>> isert_cq_work [ib_isert]
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: ffff883f4c0dca80
>>>> 00000000af8ca7a4 ffff883f7fb43da8 ffffffff810ac83f
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: 0000000000000005
>>>> ffffffff81adb680 ffff883f7fb43dc0 ffffffff810af179
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: 0000000000000006
>>>> ffff883f7fb43df0 ffffffff810e1c10 ffff883f7fb57b80
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: Call Trace:
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: <IRQ>  [<ffffffff810ac83f>]
>>>> sched_show_task+0xaf/0x110
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810af179>]
>>>> dump_cpu_task+0x39/0x40
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810e1c10>]
>>>> rcu_dump_cpu_stacks+0x80/0xb0
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810e6040>]
>>>> rcu_check_callbacks+0x540/0x820
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810afd51>] ?
>>>> account_system_time+0x81/0x110
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810fa9a0>] ?
>>>> tick_sched_do_timer+0x50/0x50
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810eb4d9>]
>>>> update_process_times+0x39/0x60
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810fa755>]
>>>> tick_sched_handle.isra.17+0x25/0x60
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810fa9dd>]
>>>> tick_sched_timer+0x3d/0x70
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810ec0c2>]
>>>> __hrtimer_run_queues+0x102/0x290
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810ec5a8>]
>>>> hrtimer_interrupt+0xa8/0x1a0
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff81052c65>]
>>>> local_apic_timer_interrupt+0x35/0x60
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8172343d>]
>>>> smp_apic_timer_interrupt+0x3d/0x50
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff817216f7>]
>>>> apic_timer_interrupt+0x87/0x90
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: <EOI>  [<ffffffff810d70fe>]
>>>> ? console_unlock+0x41e/0x4e0
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810d74bc>]
>>>> vprintk_emit+0x2fc/0x500
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810d783f>]
>>>> vprintk_default+0x1f/0x30
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff81174c2a>] printk+0x5d/0x74
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff814bc351>]
>>>> transport_lookup_cmd_lun+0x1d1/0x200
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff814edcf0>]
>>>> iscsit_setup_scsi_cmd+0x230/0x540
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffffa0890bf3>]
>>>> isert_rx_do_work+0x3f3/0x7f0 [ib_isert]
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffffa0891174>]
>>>> isert_cq_work+0x184/0x770 [ib_isert]
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8109734f>]
>>>> process_one_work+0x14f/0x400
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff81097bc4>]
>>>> worker_thread+0x114/0x470
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8171c55a>] ?
>>>> __schedule+0x34a/0x7f0
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff81097ab0>] ?
>>>> rescuer_thread+0x310/0x310
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8109d708>] kthread+0xd8/0xf0
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8109d630>] ?
>>>> kthread_park+0x60/0x60
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff81720c8f>]
>>>> ret_from_fork+0x3f/0x70
>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8109d630>] ?
>>>> kthread_park+0x60/0x60
>>>>
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: INFO: rcu_sched
>>>> self-detected stall on CPU
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: #01128-...: (5999 ticks this
>>>> GP) idle=2f9/140000000000001/0 softirq=457/457 fqs=4830
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: #011 (t=6000 jiffies g=3546
>>>> c=3545 q=0)
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: Task dump for CPU 28:
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: iscsi_np        R  running
>>>> task        0 16597      2 0x0000000c
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: ffff887f40350000
>>>> 00000000b98a67bb ffff887f7f503da8 ffffffff810ac8ff
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: 000000000000001c
>>>> ffffffff81adb680 ffff887f7f503dc0 ffffffff810af239
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: 000000000000001d
>>>> ffff887f7f503df0 ffffffff810e1cd0 ffff887f7f517b80
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: Call Trace:
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: <IRQ>  [<ffffffff810ac8ff>]
>>>> sched_show_task+0xaf/0x110
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810af239>]
>>>> dump_cpu_task+0x39/0x40
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810e1cd0>]
>>>> rcu_dump_cpu_stacks+0x80/0xb0
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810e6100>]
>>>> rcu_check_callbacks+0x540/0x820
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810afe11>] ?
>>>> account_system_time+0x81/0x110
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810faa60>] ?
>>>> tick_sched_do_timer+0x50/0x50
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810eb599>]
>>>> update_process_times+0x39/0x60
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810fa815>]
>>>> tick_sched_handle.isra.17+0x25/0x60
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810faa9d>]
>>>> tick_sched_timer+0x3d/0x70
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810ec182>]
>>>> __hrtimer_run_queues+0x102/0x290
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810ec668>]
>>>> hrtimer_interrupt+0xa8/0x1a0
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff81052c65>]
>>>> local_apic_timer_interrupt+0x35/0x60
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff81723cbd>]
>>>> smp_apic_timer_interrupt+0x3d/0x50
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff81721f77>]
>>>> apic_timer_interrupt+0x87/0x90
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: <EOI>  [<ffffffff810d71be>]
>>>> ? console_unlock+0x41e/0x4e0
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810d757c>]
>>>> vprintk_emit+0x2fc/0x500
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810d78ff>]
>>>> vprintk_default+0x1f/0x30
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff81174dde>] printk+0x5d/0x74
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff814e71ad>]
>>>> iscsi_target_locate_portal+0x62d/0x6f0
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff814e5100>]
>>>> iscsi_target_login_thread+0x6f0/0xfc0
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff814e4a10>] ?
>>>> iscsi_target_login_sess_out+0x250/0x250
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff8109d7c8>] kthread+0xd8/0xf0
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff8109d6f0>] ?
>>>> kthread_park+0x60/0x60
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff8172150f>]
>>>> ret_from_fork+0x3f/0x70
>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff8109d6f0>] ?
>>>> kthread_park+0x60/0x60
>>>>
>>>> I don't think this one is related, but it happened a couple of times:
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: INFO: rcu_sched
>>>> self-detected stall on CPU
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: #01119-...: (5999 ticks this
>>>> GP) idle=727/140000000000001/0 softirq=1346/1346 fqs=4990
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: #011 (t=6000 jiffies g=4295
>>>> c=4294 q=0)
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: Task dump for CPU 19:
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: kworker/19:1    R  running
>>>> task        0   301      2 0x00000008
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: Workqueue:
>>>> events_power_efficient fb_flashcursor
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: ffff883f6009ca80
>>>> 00000000010a7cdd ffff883f7fcc3da8 ffffffff810ac8ff
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: 0000000000000013
>>>> ffffffff81adb680 ffff883f7fcc3dc0 ffffffff810af239
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: 0000000000000014
>>>> ffff883f7fcc3df0 ffffffff810e1cd0 ffff883f7fcd7b80
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: Call Trace:
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: <IRQ>  [<ffffffff810ac8ff>]
>>>> sched_show_task+0xaf/0x110
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810af239>]
>>>> dump_cpu_task+0x39/0x40
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810e1cd0>]
>>>> rcu_dump_cpu_stacks+0x80/0xb0
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810e6100>]
>>>> rcu_check_callbacks+0x540/0x820
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810afe11>] ?
>>>> account_system_time+0x81/0x110
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810faa60>] ?
>>>> tick_sched_do_timer+0x50/0x50
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810eb599>]
>>>> update_process_times+0x39/0x60
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810fa815>]
>>>> tick_sched_handle.isra.17+0x25/0x60
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810faa9d>]
>>>> tick_sched_timer+0x3d/0x70
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810ec182>]
>>>> __hrtimer_run_queues+0x102/0x290
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810ec668>]
>>>> hrtimer_interrupt+0xa8/0x1a0
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff81052c65>]
>>>> local_apic_timer_interrupt+0x35/0x60
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff81723cbd>]
>>>> smp_apic_timer_interrupt+0x3d/0x50
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff81721f77>]
>>>> apic_timer_interrupt+0x87/0x90
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: <EOI>  [<ffffffff810d71be>]
>>>> ? console_unlock+0x41e/0x4e0
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff813866ad>]
>>>> fb_flashcursor+0x5d/0x140
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8138bc00>] ?
>>>> bit_clear+0x110/0x110
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8109740f>]
>>>> process_one_work+0x14f/0x400
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff81097c84>]
>>>> worker_thread+0x114/0x470
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8171cdda>] ?
>>>> __schedule+0x34a/0x7f0
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff81097b70>] ?
>>>> rescuer_thread+0x310/0x310
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8109d7c8>] kthread+0xd8/0xf0
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8109d6f0>] ?
>>>> kthread_park+0x60/0x60
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8172150f>]
>>>> ret_from_fork+0x3f/0x70
>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8109d6f0>] ?
>>>> kthread_park+0x60/0x60
>>>
>>> RCU self-detected schedule stalls typically mean some code is
>>> monopolizing execution on a specific CPU for an extended period of time
>>> (eg: endless loop), preventing normal RCU grace-period callbacks from
>>> running in a timely manner.
>>>
>>> It's hard to tell without more log context and/or crashdump what was
>>> going on here.
>>>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH V2 00/22] Broadcom RoCE Driver (bnxt_re)
From: Doug Ledford @ 2016-12-12 23:52 UTC (permalink / raw)
  To: Selvin Xavier, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1481266096-23331-1-git-send-email-selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 1282 bytes --]

On 12/9/2016 1:47 AM, Selvin Xavier wrote:
> This series introduces the RoCE driver for the Broadcom
> NetXtreme-E 10/25/40/50 gigabit RoCE HCAs. 
> This driver is dependent on the bnxt_en NIC driver and is 
> based on the bnxt_re branch in Doug's repository. bnxt_en changes
> required for this patch series is already available in this branch.
> 
> I am preparing a git repository with these changes as per Jason's
> comment and will share the details later today.
> 
> v1-> v2:
>   * The license text in each file updated to reflect Dual license.
>   * Makefile and Kconfig changes are pushed to the last patch
>   * Moved bnxt_re_uverbs_abi.h to include/uapi/rdma folder
>   * Remove duplicate structure definitions from bnxt_re_hsi.h as
>     it is available in the corresponding bnxt_en header file (bnxt_hsi.h)
>   * Removed some unused code reported during code review.
>   * Fixed few sparse warnings
> 
> Doug,
> Please review and consider applying this to linux-rdma repository.

There are outstanding review comments to be addressed still yet, and the
v2 patchset doesn't compile for me in 0day testing.  I'm going to bounce
this one to 4.11.


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG Key ID: 0E572FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply

* Re: [PATCH] i40iw: Use correct src address in memcpy to rdma stats counters
From: Doug Ledford @ 2016-12-12 22:20 UTC (permalink / raw)
  To: Shiraz Saleem
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	e1000-rdma-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org,
	cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org
In-Reply-To: <20161206173917.GB15668-GOXS9JX10wfOxmVO0tvppfooFf0ArEBIu+b9c/7xato@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 1487 bytes --]

On 12/6/2016 12:39 PM, Shiraz Saleem wrote:
> Doug - Can you please pick this up for 4.9? It was a bug introduced in 4.7. 
> 
> 
> On Fri, Nov 11, 2016 at 09:55:41AM -0700, Saleem, Shiraz wrote:
>> hw_stats is a pointer to i40_iw_dev_stats struct in i40iw_get_hw_stats().
>> Use hw_stats and not &hw_stats in the memcpy to copy the i40iw device stats
>> data into rdma_hw_stats counters.
>>
>> Fixes: b40f4757daa1 ("IB/core: Make device counter infrastructure dynamic")
>>
>> Signed-off-by: Shiraz Saleem <shiraz.saleem-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>> Signed-off-by: Faisal Latif <faisal.latif-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>> ---
>>  drivers/infiniband/hw/i40iw/i40iw_verbs.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
>> index b71394b..02c8f9a 100644
>> --- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c
>> +++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
>> @@ -2498,7 +2498,7 @@ static int i40iw_get_hw_stats(struct ib_device *ibdev,
>>  			return -ENOSYS;
>>  	}
>>  
>> -	memcpy(&stats->value[0], &hw_stats, sizeof(*hw_stats));
>> +	memcpy(&stats->value[0], hw_stats, sizeof(*hw_stats));
>>  
>>  	return stats->num_counters;
>>  }
>> -- 
>> 2.8.0
>>

Picked up for the current kernel and stable tag added.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG Key ID: 0E572FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply

* Re: [PATCH] i40iw: Remove macros I40IW_STAG_KEY_FROM_STAG and I40IW_STAG_INDEX_FROM_STAG
From: Doug Ledford @ 2016-12-12 22:15 UTC (permalink / raw)
  To: Faisal Latif, Leon Romanovsky
  Cc: Thomas Huth, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Chien Tin Tung,
	Mustafa Ismail, Shiraz Saleem, Tatyana Nikolova, Sean Hefty,
	Hal Rosenstock
In-Reply-To: <20161006200220.GB14272@flatif-MOBL1>


[-- Attachment #1.1: Type: text/plain, Size: 1014 bytes --]

On 10/6/2016 4:02 PM, Faisal Latif wrote:
> On Wed, Oct 05, 2016 at 03:57:11PM +0300, Leon Romanovsky wrote:
>> On Wed, Oct 05, 2016 at 01:55:38PM +0200, Thomas Huth wrote:
>>> The macros I40IW_STAG_KEY_FROM_STAG and I40IW_STAG_INDEX_FROM_STAG are
>>> apparently bad - they are using the logical "&&" operation which
>>> does not make sense here. It should have been a bitwise "&" instead.
>>> Since the macros seem to be completely unused, let's simply remove
>>> them so that nobody accidentially uses them in the future. And while
>>> we're at it, also remove the unused macro I40IW_CREATE_STAG.
>>>
>>> Signed-off-by: Thomas Huth <thuth-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>
>> Thanks,
>> Reviewed-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> Thanks
> Acked-by: Faisal Latif <faisal.latif-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> 
> 

Applied, thanks.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG Key ID: 0E572FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply

* Re: [PATCH] i40iw: Reorganize structures to align with HW capabilities
From: Doug Ledford @ 2016-12-12 22:15 UTC (permalink / raw)
  To: Henry Orosco
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	e1000-rdma-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
In-Reply-To: <20161206221620.36524-1-henry.orosco-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 602 bytes --]

On 12/6/2016 5:16 PM, Henry Orosco wrote:
> Some resources are incorrectly organized and at odds with
> HW capabilities. Specifically, ILQ, IEQ, QPs, MSS, QOS
> and statistics belong in a VSI.
> 
> Signed-off-by: Faisal Latif <faisal.latif-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Signed-off-by: Mustafa Ismail <mustafa.ismail-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Signed-off-by: Henry Orosco <henry.orosco-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Big churn patch applied, thanks ;-)

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG Key ID: 0E572FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply

* Re: [PATCH 0/6] i40iw: Fixes for i40iw
From: Doug Ledford @ 2016-12-12 22:14 UTC (permalink / raw)
  To: Henry Orosco
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	e1000-rdma-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
In-Reply-To: <20161206214935.41584-1-henry.orosco-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 1101 bytes --]

On 12/6/2016 4:49 PM, Henry Orosco wrote:
> The following patch series has fixes for the i40iw driver.
> 
> Mustafa Ismail (4):
>   i40iw: Fix double free of QP
>   i40iw: Fix memory leak in CQP destroy when in reset
>   i40iw: Assign MSS only when it is a new MTU
>   i40iw: Fix incorrect check for error
> 
> Shiraz Saleem (2):
>   i40iw: Fix QP flush to not hang on empty queues or failure
>   i40iw: Fix race condition in terminate timer's handler
> 
>  drivers/infiniband/hw/i40iw/i40iw.h       | 11 +++++++----
>  drivers/infiniband/hw/i40iw/i40iw_cm.c    | 22 +++++++++++++++++-----
>  drivers/infiniband/hw/i40iw/i40iw_hw.c    | 30 +++++++++++++++++++++++++++---
>  drivers/infiniband/hw/i40iw/i40iw_main.c  | 11 +++++------
>  drivers/infiniband/hw/i40iw/i40iw_puda.c  |  2 +-
>  drivers/infiniband/hw/i40iw/i40iw_utils.c |  5 ++++-
>  drivers/infiniband/hw/i40iw/i40iw_verbs.c |  2 +-
>  7 files changed, 62 insertions(+), 21 deletions(-)
> 

Series applied, thanks.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG Key ID: 0E572FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply

* Re: [PATCH for-next 0/6] IB/hns: Bug Fixes for HNS RoCE Driver
From: Doug Ledford @ 2016-12-12 22:09 UTC (permalink / raw)
  To: Salil Mehta
  Cc: xavier.huwei, oulijun, xushaobo2, mehta.salil.lnk, lijun_nudt,
	linux-rdma, netdev, linux-kernel, linuxarm
In-Reply-To: <20161129231030.1105600-1-salil.mehta@huawei.com>


[-- Attachment #1.1: Type: text/plain, Size: 1076 bytes --]

On 11/29/2016 6:10 PM, Salil Mehta wrote:
> This patch-set contains bug fixes for the HNS RoCE driver.
> 
> Lijun Ou (1):
>   IB/hns: Fix the IB device name
> 
> Shaobo Xu (2):
>   IB/hns: Fix the bug when free mr
>   IB/hns: Fix the bug when free cq
> 
> Wei Hu (Xavier) (3):
>   IB/hns: Fix the bug when destroy qp
>   IB/hns: Fix the bug of setting port mtu
>   IB/hns: Delete the redundant memset operation
> 
>  drivers/infiniband/hw/hns/hns_roce_cmd.h    |    5 -
>  drivers/infiniband/hw/hns/hns_roce_common.h |   42 ++
>  drivers/infiniband/hw/hns/hns_roce_cq.c     |   27 +-
>  drivers/infiniband/hw/hns/hns_roce_device.h |   18 +
>  drivers/infiniband/hw/hns/hns_roce_hw_v1.c  |  967 ++++++++++++++++++++++++---
>  drivers/infiniband/hw/hns/hns_roce_hw_v1.h  |   57 ++
>  drivers/infiniband/hw/hns/hns_roce_main.c   |   26 +-
>  drivers/infiniband/hw/hns/hns_roce_mr.c     |   21 +-
>  8 files changed, 1026 insertions(+), 137 deletions(-)
> 

Series applied, thanks.

-- 
Doug Ledford <dledford@redhat.com>
    GPG Key ID: 0E572FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply

* Re: [PATCH V3 for-next 00/11] Code improvements & fixes for HNS RoCE driver
From: Doug Ledford @ 2016-12-12 22:09 UTC (permalink / raw)
  To: Salil Mehta
  Cc: xavier.huwei, oulijun, xushaobo2, mehta.salil.lnk, lijun_nudt,
	linux-rdma, netdev, linux-kernel, linuxarm
In-Reply-To: <20161123194109.420760-1-salil.mehta@huawei.com>


[-- Attachment #1.1: Type: text/plain, Size: 1930 bytes --]

On 11/23/2016 2:40 PM, Salil Mehta wrote:
> This patchset introduces some code improvements and fixes
> for the identified problems in the HNS RoCE driver.
> 
> Lijun Ou (4):
>   IB/hns: Add the interface for querying QP1
>   IB/hns: add self loopback for CM
>   IB/hns: Modify the condition of notifying hardware loopback
>   IB/hns: Fix the bug for qp state in hns_roce_v1_m_qp()
> 
> Salil Mehta (1):
>   IB/hns: Fix for Checkpatch.pl comment style errors
> 
> Shaobo Xu (1):
>   IB/hns: Implement the add_gid/del_gid and optimize the GIDs
>     management
> 
> Wei Hu (Xavier) (5):
>   IB/hns: Add code for refreshing CQ CI using TPTR
>   IB/hns: Optimize the logic of allocating memory using APIs
>   IB/hns: Modify the macro for the timeout when cmd process
>   IB/hns: Modify query info named port_num when querying RC QP
>   IB/hns: Change qpn allocation to round-robin mode.
> 
>  drivers/infiniband/hw/hns/hns_roce_alloc.c  |   11 +-
>  drivers/infiniband/hw/hns/hns_roce_cmd.c    |    8 +-
>  drivers/infiniband/hw/hns/hns_roce_cmd.h    |    7 +-
>  drivers/infiniband/hw/hns/hns_roce_common.h |    2 -
>  drivers/infiniband/hw/hns/hns_roce_cq.c     |   17 +-
>  drivers/infiniband/hw/hns/hns_roce_device.h |   45 ++--
>  drivers/infiniband/hw/hns/hns_roce_eq.c     |    6 +-
>  drivers/infiniband/hw/hns/hns_roce_hem.c    |    6 +-
>  drivers/infiniband/hw/hns/hns_roce_hw_v1.c  |  267 +++++++++++++++++------
>  drivers/infiniband/hw/hns/hns_roce_hw_v1.h  |   17 +-
>  drivers/infiniband/hw/hns/hns_roce_main.c   |  311 +++++++--------------------
>  drivers/infiniband/hw/hns/hns_roce_mr.c     |   22 +-
>  drivers/infiniband/hw/hns/hns_roce_pd.c     |    5 +-
>  drivers/infiniband/hw/hns/hns_roce_qp.c     |    2 +-
>  14 files changed, 364 insertions(+), 362 deletions(-)
> 

Series applied, thanks.

-- 
Doug Ledford <dledford@redhat.com>
    GPG Key ID: 0E572FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply

* Re: [PATCH v2 0/2] IB/rxe: Fix kernel panics when tearing down QPs
From: Doug Ledford @ 2016-12-12 21:42 UTC (permalink / raw)
  To: Andrew Boyer, monis-VPRAkNaXOzVWk0Htik3J/w,
	yonatanc-VPRAkNaXOzVWk0Htik3J/w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1480945335-2577-1-git-send-email-andrew.boyer-8PEkshWhKlo@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 1067 bytes --]

On 12/5/2016 8:42 AM, Andrew Boyer wrote:
> This is a set of two patches that prevent kernel panics seen when tearing
> down QPs. The second patch (holding refs in tasklets) might or might not be
> needed once the first patch (waiting for tasklets to finish) is applied.
> Feedback welcomed.
> 
> Update for v2:
>  - Remove default initialization of idle in rxe_cleanup_task() per review
> 
> Andrew Boyer (2):
>   IB/rxe: Wait for tasklets to finish before tearing down QP
>   IB/rxe: Hold refs when running tasklets
> 
>  drivers/infiniband/sw/rxe/rxe_comp.c |  4 ++++
>  drivers/infiniband/sw/rxe/rxe_req.c  |  4 ++++
>  drivers/infiniband/sw/rxe/rxe_resp.c |  3 +++
>  drivers/infiniband/sw/rxe/rxe_task.c | 19 +++++++++++++++++++
>  drivers/infiniband/sw/rxe/rxe_task.h |  1 +
>  5 files changed, 31 insertions(+)
> 

Hi Andrew, I took these patches as well.  I had to fix up the second
one, so you might want to double check it.  Thanks.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG Key ID: 0E572FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply

* Re: [PATCH v2 0/8] RXE improvements
From: Doug Ledford @ 2016-12-12 21:40 UTC (permalink / raw)
  To: Boyer, Andrew, Moni Shoua; +Cc: Yonatan Cohen, linux-rdma, Bart Van Assche
In-Reply-To: <D46ADB9D.7D50%Andrew.Boyer-mb1K0bWo544@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 1490 bytes --]

On 12/5/2016 8:58 AM, Boyer, Andrew wrote:
> On 11/23/16, 12:59 PM, "monisonlists-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org on behalf of Moni Shoua"
> <monisonlists-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org on behalf of monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> 
>>> Andrew Boyer (8):
>>>   IB/rxe: Remove buffer used for printing IP address
>>>   IB/rxe: Advance the consumer pointer before posting the CQE
>>>   IB/rxe: Don't update the response PSN unless it's going forwards
>>>   IB/rxe: Unblock loopback by moving skb_out increment
>>>   IB/rxe: Add support for zero-byte operations
>>>   IB/rxe: Add support for IB_CQ_REPORT_MISSED_EVENTS
>>>   IB/rxe: Fix ref leak in rxe_create_qp()
>>>   IB/rxe: Fix ref leak in duplicate_request()
>>
>> Thanks for the series.
>>
>> Please see comment in response for the first patch but except that
>>
>> Acked-by: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>
>> for everything else
> 
> Hello Moni,
> I saw the comment from Bart in response to the first patch, but I prefer
> it this way because of how much other code is required to take advantage
> of %pIS. If you want it changed to %pIS, we can do that, though. What
> would you prefer?

Hi Andrew,

I took these as they are.  If someone wants to update the code to use
%pIS, it will need to be as an incremental patch.  Thanks.


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG Key ID: 0E572FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply

* Re: [PATCH v6 0/9] SELinux support for Infiniband RDMA
From: Doug Ledford @ 2016-12-12 21:38 UTC (permalink / raw)
  To: Dan Jurgens, chrisw-69jw2NvuJkxg9hUCZPvPmw,
	paul-r2n+y4ga6xFZroRs9YW3xA, sds-+05T5uksL2qpZYMLLGbcSA,
	eparis-FjpueFixGhCM4zKIHC2jIg, sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w
  Cc: selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yevgenyp-VPRAkNaXOzVWk0Htik3J/w
In-Reply-To: <1479910651-43246-1-git-send-email-danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 8454 bytes --]

On 11/23/2016 9:17 AM, Dan Jurgens wrote:
> From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> Infiniband applications access HW from user-space -- traffic is generated
> directly by HW, bypassing the kernel. Consequently, Infiniband Partitions,
> which are associated directly with HW transport endpoints, are a natural
> choice for enforcing granular mandatory access control for Infiniband. QPs may
> only send or receives packets tagged with the corresponding partition key
> (PKey). The PKey is not a cryptographic key; it's a 16 bit number identifying
> the partition.
> 
> Every Infiniband fabric is controlled by a central Subnet Manager (SM). The SM
> provisions the partitions by assigning each port with the partitions it can
> access. In addition, the SM tags each port with a subnet prefix, which
> identifies the subnet. Determining which users are allowed to access which
> partition keys on a given subnet forms an effective policy for isolating users
> on the fabric. Any application that attempts to send traffic on a given subnet
> is automatically subject to the policy, regardless of which device and port it
> uses. SM software configures the subnet through a privileged Subnet Management
> Interface (SMI), which is presented by each Infiniband port. Thus, the SMI must
> also be controlled to prevent unauthorized changes to fabric configuration and
> partitioning. 
> 
> To support access control for IB partitions and subnet management, security
> contexts must be provided for two new types of objects - PKeys and IB ports.
> 
> A PKey label consists of a subnet prefix and a range of PKey values and is
> similar to the labeling mechanism for netports. Each Infiniband port can reside
> on a different subnet. So labeling the PKey values for specific subnet prefixes
> provides the user maximum flexibility, as PKey values may be determined
> independently for different subnets. There is a single access vector for PKeys
> called "access".
> 
> An Infiniband port is labeled by device name and port number. There is a single
> access vector for IB ports called "manage_subnet".
> 
> Because RDMA allows kernel bypass, enforcement must be done during connection
> setup. Communication over RDMA requires a send and receive queue, collectively
> known as a Queue Pair (QP). A QP must be initialized by privileged system calls
> before it can be used to send or receive data. During initialization the user
> must provide the PKey and port the QP will use; at this time access control can
> be enforced.
> 
> Because there is a possibility that the enforcement settings or security
> policy can change, a means of notifying the ib_core module of such changes
> is required. To facilitate this a generic notification callback mechanism
> is added to the LSM. One callback is registered for checking the QP PKey
> associations when the policy changes. Mad agents also register a callback,
> they cache the permission to send and receive SMPs to avoid another per
> packet call to the LSM.
> 
> Because frequent accesses to the same PKey's SID is expected a cache is
> implemented which is very similar to the netport cache.
> 
> In order to properly enforce security when changes to the PKey table or
> security policy or enforcement occur ib_core must track which QPs are
> using which port, pkey index, and alternate path for every IB device.
> This makes operations that used to be atomic transactional.
> 
> When modifying a QP, ib_core must associate it with the PKey index, port,
> and alternate path specified. If the QP was already associated with
> different settings, the QP is added to the new list prior to the
> modification. If the modify succeeds then the old listing is removed. If
> the modify fails the new listing is removed and the old listing remains
> unchanged.
> 
> When destroying a QP the ib_qp structure is freed by the decive specific
> driver (i.e. mlx4_ib) if the 'destroy' is successful. This requires storing
> security related information in a separate structure. When a 'destroy'
> request is in process the ib_qp structure is in an undefined state so if
> there are changes to the security policy or PKey table, the security checks
> cannot reset the QP if it doesn't have permission for the new setting. If
> the 'destroy' fails, security for that QP must be enforced again and its
> status in the list is restored. If the 'destroy' succeeds the security info
> can be cleaned up and freed.
> 
> There are a number of locks required to protect the QP security structure
> and the QP to device/port/pkey index lists. If multiple locks are required,
> the safe locking order is: QP security structure mutex first, followed by
> any list locks needed, which are sorted first by port followed by pkey
> index.

Ack for the IB parts.  Do we have a vote on the SELinux parts from the
security people?

> ---
> v2:
> - Use void* blobs in the LSM hooks. Paul Moore
> - Make the policy change callback generic. Yuval Shaia, Paul Moore
> - Squash LSM changes into the patches where the calls are added. Paul Moore
> - Don't add new initial SIDs. Stephen Smalley
> - Squash MAD agent PKey and SMI patches and move logic to IB security. Dan Jurgens
> - Changed ib_end_port to ib_port. Paul Moore
> - Changed ib_port access vector from smp to manage_subnet. Paul Moore
> - Added pkey and ib_port details to the audit log. Paul Moore
> - See individual patches for more detail.
> 
> v3:
> - ib_port -> ib_endport. Paul Moore
> - use notifier chains for LSM notifications. Paul Moore
> - reorder parameters in hooks to put security blob first. Paul Moore
> - Don't treat device name as untrusted string in audit log. Paul Moore
> 
> v4:
> - Added separate AVC callback for LSM notifier. Paul Moore
> - Removed unneeded braces in ocontext_read. Paul Moore
> 
> v5:
> - Fix link error when CONFIG_SECURITY is not set. Build Robot
> - Strip issue and Gerrit-Id: Leon Romanovsky
> 
> v6:
> - Whitespace and bracket cleanup. James Morris
> - Cleanup error flow in sel_pkey_sid_slow. James Morris
> 
> Daniel Jurgens (9):
>   IB/core: IB cache enhancements to support Infiniband security
>   IB/core: Enforce PKey security on QPs
>   selinux lsm IB/core: Implement LSM notification system
>   IB/core: Enforce security on management datagrams
>   selinux: Create policydb version for Infiniband support
>   selinux: Allocate and free infiniband security hooks
>   selinux: Implement Infiniband PKey "Access" access vector
>   selinux: Add IB Port SMP access vector
>   selinux: Add a cache for quicker retreival of PKey SIDs
> 
>  drivers/infiniband/core/Makefile     |   3 +-
>  drivers/infiniband/core/cache.c      |  57 ++-
>  drivers/infiniband/core/core_priv.h  | 115 ++++++
>  drivers/infiniband/core/device.c     |  86 +++++
>  drivers/infiniband/core/mad.c        |  52 ++-
>  drivers/infiniband/core/security.c   | 709 +++++++++++++++++++++++++++++++++++
>  drivers/infiniband/core/uverbs_cmd.c |  20 +-
>  drivers/infiniband/core/verbs.c      |  27 +-
>  include/linux/lsm_audit.h            |  15 +
>  include/linux/lsm_hooks.h            |  35 ++
>  include/linux/security.h             |  50 +++
>  include/rdma/ib_mad.h                |   4 +
>  include/rdma/ib_verbs.h              |  49 +++
>  security/Kconfig                     |   9 +
>  security/lsm_audit.c                 |  16 +
>  security/security.c                  |  59 +++
>  security/selinux/Makefile            |   2 +-
>  security/selinux/hooks.c             |  86 ++++-
>  security/selinux/ibpkey.c            | 245 ++++++++++++
>  security/selinux/include/classmap.h  |   4 +
>  security/selinux/include/ibpkey.h    |  31 ++
>  security/selinux/include/objsec.h    |  11 +
>  security/selinux/include/security.h  |   7 +-
>  security/selinux/selinuxfs.c         |   2 +
>  security/selinux/ss/policydb.c       | 129 ++++++-
>  security/selinux/ss/policydb.h       |  27 +-
>  security/selinux/ss/services.c       |  81 ++++
>  27 files changed, 1886 insertions(+), 45 deletions(-)
>  create mode 100644 drivers/infiniband/core/security.c
>  create mode 100644 security/selinux/ibpkey.c
>  create mode 100644 security/selinux/include/ibpkey.h
> 


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG Key ID: 0E572FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply

* Re: [patch] IB/rxe: Remove unneeded cast in rxe_srq_from_attr()
From: Doug Ledford @ 2016-12-12 21:37 UTC (permalink / raw)
  To: Dan Carpenter, Moni Shoua
  Cc: Sean Hefty, Hal Rosenstock, linux-rdma, kernel-janitors
In-Reply-To: <20161117110005.GB32143@mwanda>


[-- Attachment #1.1: Type: text/plain, Size: 511 bytes --]

On 11/17/2016 6:00 AM, Dan Carpenter wrote:
> It makes me nervous when we cast pointer parameters.  I would estimate
> that around 50% of the time, it indicates a bug.  Here the cast is not
> needed becaue u32 and and unsigned int are the same thing.  Removing the
> cast makes the code more robust and future proof in case any of the
> types change.
> 
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

Thanks, applied.

-- 
Doug Ledford <dledford@redhat.com>
    GPG Key ID: 0E572FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply

* Re: [PATCH -next] IB/rxe: Use DEFINE_SPINLOCK() for spinlock
From: Doug Ledford @ 2016-12-12 21:36 UTC (permalink / raw)
  To: Moni Shoua, Wei Yongjun
  Cc: Sean Hefty, Hal Rosenstock, Wei Yongjun, linux-rdma
In-Reply-To: <CAG9sBKPfW9uBAKN8r4tRHyzP9NSoEnhpwSwH9QBiVbq2k+a6gg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 448 bytes --]

On 11/6/2016 5:05 AM, Moni Shoua wrote:
> From: Wei Yongjun <weiyongjun1-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> 
> spinlock can be initialized automatically with DEFINE_SPINLOCK()
> rather than explicitly calling spin_lock_init().
> 
> Signed-off-by: Wei Yongjun <weiyongjun1-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

Thanks, applied.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG Key ID: 0E572FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply

* Re: [PATCH] IB/rxe: avoid putting a large struct rxe_qp on stack
From: Doug Ledford @ 2016-12-12 21:30 UTC (permalink / raw)
  To: Leon Romanovsky, Arnd Bergmann
  Cc: Sean Hefty, Hal Rosenstock, Moni Shoua, Yonatan Cohen,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20160919132811.GI3273-2ukJVAZIZ/Y@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 1107 bytes --]

On 9/19/2016 9:28 AM, Leon Romanovsky wrote:
> On Mon, Sep 19, 2016 at 01:57:26PM +0200, Arnd Bergmann wrote:
>> A race condition fix added an rxe_qp structure to the stack in order
>> to be able to perform rollback in rxe_requester(), but the structure
>> is large enough to trigger the warning for possible stack overflow:
>>
>> drivers/infiniband/sw/rxe/rxe_req.c: In function 'rxe_requester':
>> drivers/infiniband/sw/rxe/rxe_req.c:757:1: error: the frame size of 2064 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
>>
>> This changes the rollback function to only save the psn inside
>> the qp, which is the only field we access in the rollback_qp
>> anyway.
>>
>> Fixes: 3050b9985024 ("IB/rxe: Fix race condition between requester and completer")
>> Signed-off-by: Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org>
> 
> Thanks Arnd,
> It is much cleaner approach.
> Reviewed-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 

Thanks, applied.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG Key ID: 0E572FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply

* Re: [PATCH V2 10/22] bnxt_re: Support for CQ verbs
From: Jonathan Toppins @ 2016-12-12 21:03 UTC (permalink / raw)
  To: Selvin Xavier, dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Eddie Wai, Devesh Sharma,
	Somnath Kotur, Sriharsha Basavapatna
In-Reply-To: <1481266096-23331-11-git-send-email-selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>

On 12/09/2016 01:48 AM, Selvin Xavier wrote:
> This patch implements support for create_cq, destroy_cq and req_notify_cq
> verbs.
> 
> Signed-off-by: Eddie Wai <eddie.wai-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Devesh Sharma <devesh.sharma-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Somnath Kotur <somnath.kotur-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Selvin Xavier <selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> ---
>  drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c    | 183 ++++++++++++++++++++++++
>  drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.h    |  47 ++++++
>  drivers/infiniband/hw/bnxtre/bnxt_re_ib_verbs.c | 154 ++++++++++++++++++++
>  drivers/infiniband/hw/bnxtre/bnxt_re_ib_verbs.h |  19 +++
>  drivers/infiniband/hw/bnxtre/bnxt_re_main.c     |   4 +
>  include/uapi/rdma/bnxt_re_uverbs_abi.h          |  11 ++
>  6 files changed, 418 insertions(+)

Something I just realized is this patch series does not modify the
MAINTAINERS file. Whom from Broadcom will be maintaining this driver?
Probably want to include this info in the v3 series

[...]

> diff --git a/drivers/infiniband/hw/bnxtre/bnxt_re_ib_verbs.c b/drivers/infiniband/hw/bnxtre/bnxt_re_ib_verbs.c
> index 3417829..f316598 100644
> --- a/drivers/infiniband/hw/bnxtre/bnxt_re_ib_verbs.c
> +++ b/drivers/infiniband/hw/bnxtre/bnxt_re_ib_verbs.c
> @@ -60,6 +60,16 @@
>  #include "bnxt_re_ib_verbs.h"
>  #include <rdma/bnxt_re_uverbs_abi.h>
>  
> +static int bnxt_re_copy_to_udata(struct bnxt_re_dev *rdev, void *data, int len,
> +				 struct ib_udata *udata)
> +{
> +	int rc;
> +
> +	rc = ib_copy_to_udata(udata, data, len);
> +
> +	return rc ? -EFAULT : 0;
> +}

This function seems to provide no value by wrapping ib_copy_to_udata,
any reason to keep it? From the two call sites for this function it
appears it can be replaced with a direct call to ib_copy_to_udata.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] IB/vmw_pvrdma: Update the driver name to vmw_pvrdma
From: Doug Ledford @ 2016-12-12 20:24 UTC (permalink / raw)
  To: Adit Ranadive, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: pv-drivers-pghWNbHTmq7QT0dZR+AlfA
In-Reply-To: <1481572362-19727-1-git-send-email-aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 304 bytes --]

On 12/12/2016 2:52 PM, Adit Ranadive wrote:
> Show the right driver name with the renamed module.
> 
> Signed-off-by: Adit Ranadive <aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>

Thanks, applied.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG Key ID: 0E572FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply

* Re: [PATCH] infiniband: hw: hfi1: constify mmu_notifier_ops structure
From: Doug Ledford @ 2016-12-12 20:21 UTC (permalink / raw)
  To: Bhumika Goyal, julia.lawall-L2FTfq7BK8M,
	mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w,
	dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1479548868-13563-1-git-send-email-bhumirks-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 428 bytes --]

On 11/19/2016 4:47 AM, Bhumika Goyal wrote:
> Declare the structure mmu_notifier_ops as const as it is only stored in
> the ops field of a mmu_notifier structure. The ops field is of type
> const struct mmu_notifier_ops *, so mmu_notifier_ops structures having
> this property can be declared as const.

Thanks, applied.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG Key ID: 0E572FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply

* Re: [PATCH] IB/hfi1: Define platform_config_table_limits once
From: Doug Ledford @ 2016-12-12 20:19 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Dennis Dalessandro, Dean Luick,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <870f083a-df8e-8a22-52c2-e4d2dde46b9d-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 696 bytes --]

On 12/5/2016 7:48 PM, Bart Van Assche wrote:
> Defining static data structures in a header file is wrong because
> this causes the data structure to be instantiated once in every .c
> file it is included in. Hence move the definition of a static
> array from a header file into the only .c file in which it is used.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Cc: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Cc: Dean Luick <dean.luick-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Applied, thanks.


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG Key ID: 0E572FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox