Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* Re: [PATCH for-next 00/14][PULL request] Mellanox mlx5 core driver updates 2016-10-25
From: Saeed Mahameed @ 2016-10-30  9:59 UTC (permalink / raw)
  To: David Miller
  Cc: Saeed Mahameed, Doug Ledford, Linux Netdev List,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Leon Romanovsky,
	Tal Alon, Matan Barak
In-Reply-To: <20161028.135309.1712496950641242201.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

On Fri, Oct 28, 2016 at 7:53 PM, David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> wrote:
>
> I really disalike pull requests of this form.
>
> You add lots of datastructures and helper functions but no actual
> users of these facilities to the driver.
>
> Do this instead:
>
>         1) Add TSAR infrastructure
>         2) Add use of TSAR facilities to the driver
>
> That's one pull request.
>
> I don't care if this is hard, or if there are entanglements with
> Infiniband or whatever, you must submit changes in this manner.
>

It is not hard, it is just not right,  we have lots of IB and ETH
features that we would like to submit in the same kernel cycle,
with your suggestion I will have to almost submit every feature (core
infrastructure and netdev/RDMA usage)
to you and Doug.  Same for rdma features,  you will receive PULL
request for them as well,
I am sure you and the netdev list don't need such noise.  do not
forget that this will slow down mlx5 progress since
netde will block rdma and vise-versa.

> I will not accept additions to a driver that don't even get really
> used.

For logic/helper functions containing patches such as "Add TSAR
infrastructure" I agree and i can find a way to move some code around
to
avoid future conflicts and remove them from such pull requests.

but you need to at least accept hardware related structures
infrastructure patches for shared code such as
include/linux/mlx5/mlx5_ifc.h where we have only hardware definitions
and those patches are really minimal.

So bottom line, I will do my best to ensure future PULL requests will
contain only include/linux/mlx5/*.h hardware related definitions
or fully implemented features.

Can we agree on that ?

Thanks,
Saeed.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH rdma-core] qede: fix general protection fault may occur on probe
From: Amrani, Ram @ 2016-10-30  9:25 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Elior, Ariel,
	Kalderon, Michal, Mintz, Yuval,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20161027140509.GH3617-2ukJVAZIZ/Y@public.gmane.org>

> The rdma-core word in the subject is misleading.

Yeah it is. The location of the code (qede) is somewhat different from the content (qedr).
I don't know how you would have done it, but at least you'll see in a future patch that we've
changed this  to be fully qedr.

Ram


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC ABI V5 01/10] RDMA/core: Refactor IDR to be per-device
From: Leon Romanovsky @ 2016-10-30  9:13 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Doug Ledford, Jason Gunthorpe, Christoph Lameter, Liran Liss,
	Haggai Eran, Majd Dibbiny, Tal Alon
In-Reply-To: <1828884A29C6694DAF28B7E6B8A82373AB0A445F-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1363 bytes --]

On Fri, Oct 28, 2016 at 10:53:13PM +0000, Hefty, Sean wrote:
> > The current code creates an IDR per type. Since types are currently
> > common for all vendors and known in advance, this was good enough.
> > However, the proposed ioctl based infrastructure allows each vendor
> > to declare only some of the common types and declare its own specific
> > types.
> >
> > Thus, we decided to implement IDR to be per device and refactor it to
> > use a new file.
>
> I think this needs to be more abstract.  I would consider introducing the concept of an 'ioctl provider', with the idr per ioctl provider.  You could then make each ib_device an ioctl provider.  (Just embed the structure).  I believe this will be necessary to support the rdma_cm, ib_cm, as well as devices that export different sets of ioctls, where an ib_device isn't necessarily available.

IDR management is internal to kernel and it looks like an easy one to extend in the future.

>
> Essentially, I would treat plugging into the uABI independent from plugging into the kernel verbs API.  Otherwise, I think we'll end up with multiple ioctl 'frameworks'.
>
> - Sean
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [RFC ABI V5 07/10] IB/core: Support getting IOCTL header/SGEs from kernel space
From: Matan Barak @ 2016-10-30  8:48 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Christoph Hellwig, Matan Barak, linux-rdma, Doug Ledford,
	Jason Gunthorpe, Sean Hefty, Christoph Lameter, Liran Liss,
	Haggai Eran, Majd Dibbiny, Tal Alon
In-Reply-To: <20161028154628.GP3617-2ukJVAZIZ/Y@public.gmane.org>

On Fri, Oct 28, 2016 at 5:46 PM, Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> On Fri, Oct 28, 2016 at 08:37:25AM -0700, Christoph Hellwig wrote:
>> On Fri, Oct 28, 2016 at 06:33:06PM +0300, Leon Romanovsky wrote:
>> > Just to summarize, to be sure that I understood you correctly.
>> >
>> > ---------    --------------------
>> > | write | -> | conversion logic | ---
>> > ---------    --------------------   |      ----------------------
>> >                                     -----> | internal interface |
>> > ---------                           |      ----------------------
>> > | ioctl | ---------------------------
>> > ---------
>> >
>> > Am I right?
>>
>> Yes, as long as the write and ioctl boxes do the copy_{from,to}_user.
>
> Thanks
>

If we accept the limitations here (i.e - all commands attributes come
either from kernel or from user,
but you can't mix them - that's mean the write comparability layer
either needs to copy all attributes
or use a direct mapping for all of them), I could just either break
ib_uverbs_cmd_verbs to a a few functions
or just pass a callback of boxing the descriptors copy.

>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: iscsi_trx going into D state
From: Nicholas A. Bellinger @ 2016-10-29 22:29 UTC (permalink / raw)
  To: Robert LeBlanc
  Cc: Zhu Lingshan, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <CAANLjFoGEi29goybqsvEg6trystEkurVz52P8SwqGUSNV1jdSw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Hi Robert,

On Wed, 2016-10-19 at 10:41 -0600, Robert LeBlanc wrote:
> Nicholas,
> 
> I didn't have high hopes for the patch because we were not seeing
> TMR_ABORT_TASK (or 'abort') in dmesg or /var/log/messages, but it
> seemed to help regardless. Our clients finally OOMed from the hung
> sessions, so we are having to reboot them and we will do some more
> testing. We haven't put the updated kernel on our clients yet. Our
> clients have iSCSI root disks so I'm not sure if we can get a vmcore
> on those, but we will do what we can to get you a vmcore from the
> target if it happens again.
> 

Just checking in to see if you've observed further issues with
iser-target ports, and/or able to generate a crashdump with v4.4.y..?

> As far as our configuration: It is a superMicro box with 6 SAMSUNG
> MZ7LM3T8HCJM-00005 SSDs. Two are for root and four are in mdadm
> RAID-10 for exporting via iSCSI/iSER. We have ZFS on top of the
> RAID-10 for checksum and snapshots only and we export ZVols to the
> clients (one or more per VM on the client). We do not persist the
> export info (targetcli saveconfig), but regenerate it from scripts.
> The client receives two or more of these exports and puts them in a
> RAID-1 device. The exports are served by iSER one one port and also by
> normal iSCSI on a different port for compatibility, but not normally
> used. If you need more info about the config, please let me know. It
> was kind of a vague request so I'm not sure what exactly is important
> to you.

Thanks for the extra details of your hardware + user-space
configuration.

> Thanks for helping us with this,
> Robert LeBlanc
> 
> When we have problems, we usually see this in the logs:
> Oct 17 08:57:50 prv-0-12-sanstack kernel: iSCSI Login timeout on
> Network Portal 0.0.0.0:3260
> Oct 17 08:57:50 prv-0-12-sanstack kernel: Unexpected ret: -104 send data 48
> Oct 17 08:57:50 prv-0-12-sanstack kernel: tx_data returned -32, expecting 48.
> Oct 17 08:57:50 prv-0-12-sanstack kernel: iSCSI Login negotiation failed.
> 
> I found some backtraces in the logs, not sure if this is helpful, this
> is before your patch (your patch booted at Oct 18 10:36:59):
> Oct 17 15:43:12 prv-0-12-sanstack kernel: INFO: rcu_sched
> self-detected stall on CPU
> Oct 17 15:43:12 prv-0-12-sanstack kernel: #0115-...: (41725 ticks this
> GP) idle=b59/140000000000001/0 softirq=535/535 fqs=30992
> Oct 17 15:43:12 prv-0-12-sanstack kernel: #011 (t=42006 jiffies g=1550
> c=1549 q=0)
> Oct 17 15:43:12 prv-0-12-sanstack kernel: Task dump for CPU 5:
> Oct 17 15:43:12 prv-0-12-sanstack kernel: kworker/u68:2   R  running
> task        0 17967      2 0x00000008
> Oct 17 15:43:12 prv-0-12-sanstack kernel: Workqueue: isert_comp_wq
> isert_cq_work [ib_isert]
> Oct 17 15:43:12 prv-0-12-sanstack kernel: ffff883f4c0dca80
> 00000000af8ca7a4 ffff883f7fb43da8 ffffffff810ac83f
> Oct 17 15:43:12 prv-0-12-sanstack kernel: 0000000000000005
> ffffffff81adb680 ffff883f7fb43dc0 ffffffff810af179
> Oct 17 15:43:12 prv-0-12-sanstack kernel: 0000000000000006
> ffff883f7fb43df0 ffffffff810e1c10 ffff883f7fb57b80
> Oct 17 15:43:12 prv-0-12-sanstack kernel: Call Trace:
> Oct 17 15:43:12 prv-0-12-sanstack kernel: <IRQ>  [<ffffffff810ac83f>]
> sched_show_task+0xaf/0x110
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810af179>]
> dump_cpu_task+0x39/0x40
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810e1c10>]
> rcu_dump_cpu_stacks+0x80/0xb0
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810e6040>]
> rcu_check_callbacks+0x540/0x820
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810afd51>] ?
> account_system_time+0x81/0x110
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810fa9a0>] ?
> tick_sched_do_timer+0x50/0x50
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810eb4d9>]
> update_process_times+0x39/0x60
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810fa755>]
> tick_sched_handle.isra.17+0x25/0x60
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810fa9dd>]
> tick_sched_timer+0x3d/0x70
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810ec0c2>]
> __hrtimer_run_queues+0x102/0x290
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810ec5a8>]
> hrtimer_interrupt+0xa8/0x1a0
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff81052c65>]
> local_apic_timer_interrupt+0x35/0x60
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8172343d>]
> smp_apic_timer_interrupt+0x3d/0x50
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff817216f7>]
> apic_timer_interrupt+0x87/0x90
> Oct 17 15:43:12 prv-0-12-sanstack kernel: <EOI>  [<ffffffff810d70fe>]
> ? console_unlock+0x41e/0x4e0
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810d74bc>]
> vprintk_emit+0x2fc/0x500
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810d783f>]
> vprintk_default+0x1f/0x30
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff81174c2a>] printk+0x5d/0x74
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff814bc351>]
> transport_lookup_cmd_lun+0x1d1/0x200
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff814edcf0>]
> iscsit_setup_scsi_cmd+0x230/0x540
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffffa0890bf3>]
> isert_rx_do_work+0x3f3/0x7f0 [ib_isert]
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffffa0891174>]
> isert_cq_work+0x184/0x770 [ib_isert]
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8109734f>]
> process_one_work+0x14f/0x400
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff81097bc4>]
> worker_thread+0x114/0x470
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8171c55a>] ?
> __schedule+0x34a/0x7f0
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff81097ab0>] ?
> rescuer_thread+0x310/0x310
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8109d708>] kthread+0xd8/0xf0
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8109d630>] ?
> kthread_park+0x60/0x60
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff81720c8f>]
> ret_from_fork+0x3f/0x70
> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8109d630>] ?
> kthread_park+0x60/0x60
> 
> Oct 17 16:34:03 prv-0-12-sanstack kernel: INFO: rcu_sched
> self-detected stall on CPU
> Oct 17 16:34:03 prv-0-12-sanstack kernel: #01128-...: (5999 ticks this
> GP) idle=2f9/140000000000001/0 softirq=457/457 fqs=4830
> Oct 17 16:34:03 prv-0-12-sanstack kernel: #011 (t=6000 jiffies g=3546
> c=3545 q=0)
> Oct 17 16:34:03 prv-0-12-sanstack kernel: Task dump for CPU 28:
> Oct 17 16:34:03 prv-0-12-sanstack kernel: iscsi_np        R  running
> task        0 16597      2 0x0000000c
> Oct 17 16:34:03 prv-0-12-sanstack kernel: ffff887f40350000
> 00000000b98a67bb ffff887f7f503da8 ffffffff810ac8ff
> Oct 17 16:34:03 prv-0-12-sanstack kernel: 000000000000001c
> ffffffff81adb680 ffff887f7f503dc0 ffffffff810af239
> Oct 17 16:34:03 prv-0-12-sanstack kernel: 000000000000001d
> ffff887f7f503df0 ffffffff810e1cd0 ffff887f7f517b80
> Oct 17 16:34:03 prv-0-12-sanstack kernel: Call Trace:
> Oct 17 16:34:03 prv-0-12-sanstack kernel: <IRQ>  [<ffffffff810ac8ff>]
> sched_show_task+0xaf/0x110
> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810af239>]
> dump_cpu_task+0x39/0x40
> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810e1cd0>]
> rcu_dump_cpu_stacks+0x80/0xb0
> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810e6100>]
> rcu_check_callbacks+0x540/0x820
> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810afe11>] ?
> account_system_time+0x81/0x110
> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810faa60>] ?
> tick_sched_do_timer+0x50/0x50
> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810eb599>]
> update_process_times+0x39/0x60
> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810fa815>]
> tick_sched_handle.isra.17+0x25/0x60
> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810faa9d>]
> tick_sched_timer+0x3d/0x70
> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810ec182>]
> __hrtimer_run_queues+0x102/0x290
> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810ec668>]
> hrtimer_interrupt+0xa8/0x1a0
> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff81052c65>]
> local_apic_timer_interrupt+0x35/0x60
> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff81723cbd>]
> smp_apic_timer_interrupt+0x3d/0x50
> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff81721f77>]
> apic_timer_interrupt+0x87/0x90
> Oct 17 16:34:03 prv-0-12-sanstack kernel: <EOI>  [<ffffffff810d71be>]
> ? console_unlock+0x41e/0x4e0
> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810d757c>]
> vprintk_emit+0x2fc/0x500
> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810d78ff>]
> vprintk_default+0x1f/0x30
> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff81174dde>] printk+0x5d/0x74
> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff814e71ad>]
> iscsi_target_locate_portal+0x62d/0x6f0
> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff814e5100>]
> iscsi_target_login_thread+0x6f0/0xfc0
> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff814e4a10>] ?
> iscsi_target_login_sess_out+0x250/0x250
> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff8109d7c8>] kthread+0xd8/0xf0
> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff8109d6f0>] ?
> kthread_park+0x60/0x60
> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff8172150f>]
> ret_from_fork+0x3f/0x70
> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff8109d6f0>] ?
> kthread_park+0x60/0x60
> 
> I don't think this one is related, but it happened a couple of times:
> Oct 17 11:46:52 prv-0-12-sanstack kernel: INFO: rcu_sched
> self-detected stall on CPU
> Oct 17 11:46:52 prv-0-12-sanstack kernel: #01119-...: (5999 ticks this
> GP) idle=727/140000000000001/0 softirq=1346/1346 fqs=4990
> Oct 17 11:46:52 prv-0-12-sanstack kernel: #011 (t=6000 jiffies g=4295
> c=4294 q=0)
> Oct 17 11:46:52 prv-0-12-sanstack kernel: Task dump for CPU 19:
> Oct 17 11:46:52 prv-0-12-sanstack kernel: kworker/19:1    R  running
> task        0   301      2 0x00000008
> Oct 17 11:46:52 prv-0-12-sanstack kernel: Workqueue:
> events_power_efficient fb_flashcursor
> Oct 17 11:46:52 prv-0-12-sanstack kernel: ffff883f6009ca80
> 00000000010a7cdd ffff883f7fcc3da8 ffffffff810ac8ff
> Oct 17 11:46:52 prv-0-12-sanstack kernel: 0000000000000013
> ffffffff81adb680 ffff883f7fcc3dc0 ffffffff810af239
> Oct 17 11:46:52 prv-0-12-sanstack kernel: 0000000000000014
> ffff883f7fcc3df0 ffffffff810e1cd0 ffff883f7fcd7b80
> Oct 17 11:46:52 prv-0-12-sanstack kernel: Call Trace:
> Oct 17 11:46:52 prv-0-12-sanstack kernel: <IRQ>  [<ffffffff810ac8ff>]
> sched_show_task+0xaf/0x110
> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810af239>]
> dump_cpu_task+0x39/0x40
> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810e1cd0>]
> rcu_dump_cpu_stacks+0x80/0xb0
> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810e6100>]
> rcu_check_callbacks+0x540/0x820
> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810afe11>] ?
> account_system_time+0x81/0x110
> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810faa60>] ?
> tick_sched_do_timer+0x50/0x50
> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810eb599>]
> update_process_times+0x39/0x60
> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810fa815>]
> tick_sched_handle.isra.17+0x25/0x60
> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810faa9d>]
> tick_sched_timer+0x3d/0x70
> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810ec182>]
> __hrtimer_run_queues+0x102/0x290
> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810ec668>]
> hrtimer_interrupt+0xa8/0x1a0
> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff81052c65>]
> local_apic_timer_interrupt+0x35/0x60
> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff81723cbd>]
> smp_apic_timer_interrupt+0x3d/0x50
> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff81721f77>]
> apic_timer_interrupt+0x87/0x90
> Oct 17 11:46:52 prv-0-12-sanstack kernel: <EOI>  [<ffffffff810d71be>]
> ? console_unlock+0x41e/0x4e0
> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff813866ad>]
> fb_flashcursor+0x5d/0x140
> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8138bc00>] ?
> bit_clear+0x110/0x110
> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8109740f>]
> process_one_work+0x14f/0x400
> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff81097c84>]
> worker_thread+0x114/0x470
> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8171cdda>] ?
> __schedule+0x34a/0x7f0
> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff81097b70>] ?
> rescuer_thread+0x310/0x310
> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8109d7c8>] kthread+0xd8/0xf0
> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8109d6f0>] ?
> kthread_park+0x60/0x60
> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8172150f>]
> ret_from_fork+0x3f/0x70
> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8109d6f0>] ?
> kthread_park+0x60/0x60

RCU self-detected schedule stalls typically mean some code is
monopolizing execution on a specific CPU for an extended period of time
(eg: endless loop), preventing normal RCU grace-period callbacks from
running in a timely manner.

It's hard to tell without more log context and/or crashdump what was
going on here.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: RDMA developer gatherings around Kernel Summit and Linux Plumbers in Santa Fe
From: Or Gerlitz @ 2016-10-29 18:28 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Doug Ledford,
	skc-YOWKrPYUwWM, Weiny, Ira, Jason Gunthorpe, John Fleck,
	Leon Romanovsky, Liran Liss, Matan Barak, Tzahi Oved
In-Reply-To: <alpine.DEB.2.20.1610281212220.8691-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>

On Fri, Oct 28, 2016 at 8:17 PM, Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org> wrote:

> - RDMA workshop on Tuesday, 1st of November
>         Meeting in Sweeney AB from 9am till 5pm.
>         See https://www.linuxplumbersconf.org/2016/ocw/events/LPC2016/schedule
>         This is part of the Kernel Summit and the Linux Plumbers
>         Conference. Only open to KS and LPC attendees with an invitation.

isn't that open to all LPC attendees? didn't follow on the invitation
requirement.. it's part of LPC
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH -next] qedr: Use list_move_tail instead of list_del/list_add_tail
From: Wei Yongjun @ 2016-10-29 16:19 UTC (permalink / raw)
  To: Doug Ledford, Sean Hefty, Hal Rosenstock, Ram Amrani,
	Rajesh Borundia
  Cc: Wei Yongjun, linux-rdma-u79uwXL29TY76Z2rM5mHXA

From: Wei Yongjun <weiyongjun1-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

Using list_move_tail() instead of list_del() + list_add_tail().

Signed-off-by: Wei Yongjun <weiyongjun1-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 drivers/infiniband/hw/qedr/verbs.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index a615142..cdaddf9 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -2413,8 +2413,7 @@ static void handle_completed_mrs(struct qedr_dev *dev, struct mr_info *info)
 		 */
 		pbl = list_first_entry(&info->inuse_pbl_list,
 				       struct qedr_pbl, list_entry);
-		list_del(&pbl->list_entry);
-		list_add_tail(&pbl->list_entry, &info->free_pbl_list);
+		list_move_tail(&pbl->list_entry, &info->free_pbl_list);
 		info->completed_handled++;
 	}
 }

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH -next] IB/rxe: Use DEFINE_SPINLOCK() for spinlock
From: Wei Yongjun @ 2016-10-29 16:19 UTC (permalink / raw)
  To: Moni Shoua, Doug Ledford, Sean Hefty, Hal Rosenstock
  Cc: Wei Yongjun, linux-rdma-u79uwXL29TY76Z2rM5mHXA

From: Wei Yongjun <weiyongjun1-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

spinlock can be initialized automatically with DEFINE_SPINLOCK()
rather than explicitly calling spin_lock_init().

Signed-off-by: Wei Yongjun <weiyongjun1-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 drivers/infiniband/sw/rxe/rxe_net.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index b8258e4..4cb6378 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -46,7 +46,7 @@
 #include "rxe_loc.h"
 
 static LIST_HEAD(rxe_dev_list);
-static spinlock_t dev_list_lock; /* spinlock for device list */
+static DEFINE_SPINLOCK(dev_list_lock); /* spinlock for device list */
 
 struct rxe_dev *net_to_rxe(struct net_device *ndev)
 {
@@ -663,8 +663,6 @@ struct notifier_block rxe_net_notifier = {
 
 int rxe_net_ipv4_init(void)
 {
-	spin_lock_init(&dev_list_lock);
-
 	recv_sockets.sk4 = rxe_setup_udp_tunnel(&init_net,
 				htons(ROCE_V2_UDP_DPORT), false);
 	if (IS_ERR(recv_sockets.sk4)) {
@@ -680,8 +678,6 @@ int rxe_net_ipv6_init(void)
 {
 #if IS_ENABLED(CONFIG_IPV6)
 
-	spin_lock_init(&dev_list_lock);
-
 	recv_sockets.sk6 = rxe_setup_udp_tunnel(&init_net,
 						htons(ROCE_V2_UDP_DPORT), true);
 	if (IS_ERR(recv_sockets.sk6)) {

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 rdma-core 7/7] libhns: Add consolidated repo for userspace library of hns
From: Lijun Ou @ 2016-10-29  9:03 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1477731826-10787-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

This patch configures the consolidated repo to build userspace
library of hns(libhns).

Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
v2:
- Delete the CHECK_C_SOURCE_COMPILES and sort the .c file

v1:
- The initial submit
---
 CMakeLists.txt               | 1 +
 MAINTAINERS                  | 6 ++++++
 README.md                    | 1 +
 providers/hns/CMakeLists.txt | 6 ++++++
 4 files changed, 14 insertions(+)
 create mode 100644 providers/hns/CMakeLists.txt

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 230aab5..5ce8e15 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -328,6 +328,7 @@ add_subdirectory(libibcm)
 add_subdirectory(providers/cxgb3)
 add_subdirectory(providers/cxgb4)
 add_subdirectory(providers/hfi1verbs)
+add_subdirectory(providers/hns)
 add_subdirectory(providers/i40iw)
 add_subdirectory(providers/ipathverbs)
 add_subdirectory(providers/mlx4)
diff --git a/MAINTAINERS b/MAINTAINERS
index d83de10..bc6eb50 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -57,6 +57,12 @@ S:	Supported
 L:	intel-opa-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org (moderated for non-subscribers)
 F:	providers/hfi1verbs/
 
+HNS USERSPACE PROVIDER (for hns-roce.ko)
+M:	Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
+M:	Wei Hu(Xavier) <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
+S:	Supported
+F:	providers/hns/
+
 I40IW USERSPACE PROVIDER (for i40iw.ko)
 M:	Tatyana Nikolova <Tatyana.E.Nikolova-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
 S:	Supported
diff --git a/README.md b/README.md
index 3a13042..e3bc33f 100644
--- a/README.md
+++ b/README.md
@@ -18,6 +18,7 @@ is included:
  - iw_cxgb3.ko
  - iw_cxgb4.ko
  - hfi1.ko
+ - hns-roce.ko
  - i40iw.ko
  - ib_qib.ko
  - mlx4_ib.ko
diff --git a/providers/hns/CMakeLists.txt b/providers/hns/CMakeLists.txt
new file mode 100644
index 0000000..19a793e
--- /dev/null
+++ b/providers/hns/CMakeLists.txt
@@ -0,0 +1,6 @@
+rdma_provider(hns
+  hns_roce_u.c
+  hns_roce_u_buf.c
+  hns_roce_u_hw_v1.c
+  hns_roce_u_verbs.c
+)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 rdma-core 6/7] libhns: Add verbs of post_send and post_recv support
From: Lijun Ou @ 2016-10-29  9:03 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1477731826-10787-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

This patch mainly introduces the verbs of posting send
and psoting recv.

Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 providers/hns/hns_roce_u.c       |   2 +
 providers/hns/hns_roce_u.h       |   8 +
 providers/hns/hns_roce_u_hw_v1.c | 314 +++++++++++++++++++++++++++++++++++++++
 providers/hns/hns_roce_u_hw_v1.h |  79 ++++++++++
 4 files changed, 403 insertions(+)

diff --git a/providers/hns/hns_roce_u.c b/providers/hns/hns_roce_u.c
index 30f8678..bceed84 100644
--- a/providers/hns/hns_roce_u.c
+++ b/providers/hns/hns_roce_u.c
@@ -131,6 +131,8 @@ static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
 	context->ibv_ctx.ops.query_qp	   = hns_roce_u_query_qp;
 	context->ibv_ctx.ops.modify_qp     = hr_dev->u_hw->modify_qp;
 	context->ibv_ctx.ops.destroy_qp    = hr_dev->u_hw->destroy_qp;
+	context->ibv_ctx.ops.post_send     = hr_dev->u_hw->post_send;
+	context->ibv_ctx.ops.post_recv     = hr_dev->u_hw->post_recv;
 
 	if (hns_roce_u_query_device(&context->ibv_ctx, &dev_attrs))
 		goto tptr_free;
diff --git a/providers/hns/hns_roce_u.h b/providers/hns/hns_roce_u.h
index 02b9251..4a6ed8e 100644
--- a/providers/hns/hns_roce_u.h
+++ b/providers/hns/hns_roce_u.h
@@ -51,6 +51,10 @@
 
 #define PFX				"hns: "
 
+#ifndef likely
+#define likely(x)     __builtin_expect(!!(x), 1)
+#endif
+
 #define roce_get_field(origin, mask, shift) \
 	(((origin) & (mask)) >> (shift))
 
@@ -171,6 +175,10 @@ struct hns_roce_qp {
 struct hns_roce_u_hw {
 	int (*poll_cq)(struct ibv_cq *ibvcq, int ne, struct ibv_wc *wc);
 	int (*arm_cq)(struct ibv_cq *ibvcq, int solicited);
+	int (*post_send)(struct ibv_qp *ibvqp, struct ibv_send_wr *wr,
+			 struct ibv_send_wr **bad_wr);
+	int (*post_recv)(struct ibv_qp *ibvqp, struct ibv_recv_wr *wr,
+			 struct ibv_recv_wr **bad_wr);
 	int (*modify_qp)(struct ibv_qp *qp, struct ibv_qp_attr *attr,
 			 int attr_mask);
 	int (*destroy_qp)(struct ibv_qp *ibqp);
diff --git a/providers/hns/hns_roce_u_hw_v1.c b/providers/hns/hns_roce_u_hw_v1.c
index fb81634..a3aad1c 100644
--- a/providers/hns/hns_roce_u_hw_v1.c
+++ b/providers/hns/hns_roce_u_hw_v1.c
@@ -37,6 +37,59 @@
 #include "hns_roce_u_hw_v1.h"
 #include "hns_roce_u.h"
 
+static inline void set_raddr_seg(struct hns_roce_wqe_raddr_seg *rseg,
+				 uint64_t remote_addr, uint32_t rkey)
+{
+	rseg->raddr    = remote_addr;
+	rseg->rkey     = rkey;
+	rseg->len      = 0;
+}
+
+static void set_data_seg(struct hns_roce_wqe_data_seg *dseg, struct ibv_sge *sg)
+{
+
+	dseg->lkey = sg->lkey;
+	dseg->addr = sg->addr;
+	dseg->len = sg->length;
+}
+
+static void hns_roce_update_rq_head(struct hns_roce_context *ctx,
+				    unsigned int qpn, unsigned int rq_head)
+{
+	struct hns_roce_rq_db rq_db;
+
+	rq_db.u32_4 = 0;
+	rq_db.u32_8 = 0;
+
+	roce_set_field(rq_db.u32_4, RQ_DB_U32_4_RQ_HEAD_M,
+		       RQ_DB_U32_4_RQ_HEAD_S, rq_head);
+	roce_set_field(rq_db.u32_8, RQ_DB_U32_8_QPN_M, RQ_DB_U32_8_QPN_S, qpn);
+	roce_set_field(rq_db.u32_8, RQ_DB_U32_8_CMD_M, RQ_DB_U32_8_CMD_S, 1);
+	roce_set_bit(rq_db.u32_8, RQ_DB_U32_8_HW_SYNC_S, 1);
+
+	hns_roce_write64((uint32_t *)&rq_db, ctx, ROCEE_DB_OTHERS_L_0_REG);
+}
+
+static void hns_roce_update_sq_head(struct hns_roce_context *ctx,
+				    unsigned int qpn, unsigned int port,
+				    unsigned int sl, unsigned int sq_head)
+{
+	struct hns_roce_sq_db sq_db;
+
+	sq_db.u32_4 = 0;
+	sq_db.u32_8 = 0;
+
+	roce_set_field(sq_db.u32_4, SQ_DB_U32_4_SQ_HEAD_M,
+		       SQ_DB_U32_4_SQ_HEAD_S, sq_head);
+	roce_set_field(sq_db.u32_4, SQ_DB_U32_4_PORT_M, SQ_DB_U32_4_PORT_S,
+		       port);
+	roce_set_field(sq_db.u32_4, SQ_DB_U32_4_SL_M, SQ_DB_U32_4_SL_S, sl);
+	roce_set_field(sq_db.u32_8, SQ_DB_U32_8_QPN_M, SQ_DB_U32_8_QPN_S, qpn);
+	roce_set_bit(sq_db.u32_8, SQ_DB_U32_8_HW_SYNC, 1);
+
+	hns_roce_write64((uint32_t *)&sq_db, ctx, ROCEE_DB_SQ_L_0_REG);
+}
+
 static void hns_roce_update_cq_cons_index(struct hns_roce_context *ctx,
 					  struct hns_roce_cq *cq)
 {
@@ -126,6 +179,16 @@ static struct hns_roce_cqe *next_cqe_sw(struct hns_roce_cq *cq)
 	return get_sw_cqe(cq, cq->cons_index);
 }
 
+static void *get_recv_wqe(struct hns_roce_qp *qp, int n)
+{
+	if ((n < 0) || (n > qp->rq.wqe_cnt)) {
+		printf("rq wqe index:%d,rq wqe cnt:%d\r\n", n, qp->rq.wqe_cnt);
+		return NULL;
+	}
+
+	return qp->buf.buf + qp->rq.offset + (n << qp->rq.wqe_shift);
+}
+
 static void *get_send_wqe(struct hns_roce_qp *qp, int n)
 {
 	if ((n < 0) || (n > qp->sq.wqe_cnt)) {
@@ -137,6 +200,26 @@ static void *get_send_wqe(struct hns_roce_qp *qp, int n)
 				  (n << qp->sq.wqe_shift));
 }
 
+static int hns_roce_wq_overflow(struct hns_roce_wq *wq, int nreq,
+				struct hns_roce_cq *cq)
+{
+	unsigned int cur;
+
+	cur = wq->head - wq->tail;
+	if (cur + nreq < wq->max_post)
+		return 0;
+
+	/* While the num of wqe exceeds cap of the device, cq will be locked */
+	pthread_spin_lock(&cq->lock);
+	cur = wq->head - wq->tail;
+	pthread_spin_unlock(&cq->lock);
+
+	printf("wq:(head = %d, tail = %d, max_post = %d), nreq = 0x%x\n",
+		wq->head, wq->tail, wq->max_post, nreq);
+
+	return cur + nreq >= wq->max_post;
+}
+
 static struct hns_roce_qp *hns_roce_find_qp(struct hns_roce_context *ctx,
 					    uint32_t qpn)
 {
@@ -374,6 +457,144 @@ static int hns_roce_u_v1_arm_cq(struct ibv_cq *ibvcq, int solicited)
 	return 0;
 }
 
+static int hns_roce_u_v1_post_send(struct ibv_qp *ibvqp, struct ibv_send_wr *wr,
+				   struct ibv_send_wr **bad_wr)
+{
+	unsigned int ind;
+	void *wqe;
+	int nreq;
+	int ps_opcode, i;
+	int ret = 0;
+	struct hns_roce_wqe_ctrl_seg *ctrl = NULL;
+	struct hns_roce_wqe_data_seg *dseg = NULL;
+	struct hns_roce_qp *qp = to_hr_qp(ibvqp);
+	struct hns_roce_context *ctx = to_hr_ctx(ibvqp->context);
+
+	pthread_spin_lock(&qp->sq.lock);
+
+	/* check that state is OK to post send */
+	ind = qp->sq.head;
+
+	for (nreq = 0; wr; ++nreq, wr = wr->next) {
+		if (hns_roce_wq_overflow(&qp->sq, nreq,
+					 to_hr_cq(qp->ibv_qp.send_cq))) {
+			ret = -1;
+			*bad_wr = wr;
+			goto out;
+		}
+		if (wr->num_sge > qp->sq.max_gs) {
+			ret = -1;
+			*bad_wr = wr;
+			printf("wr->num_sge(<=%d) = %d, check failed!\r\n",
+				qp->sq.max_gs, wr->num_sge);
+			goto out;
+		}
+
+		ctrl = wqe = get_send_wqe(qp, ind & (qp->sq.wqe_cnt - 1));
+		memset(ctrl, 0, sizeof(struct hns_roce_wqe_ctrl_seg));
+
+		qp->sq.wrid[ind & (qp->sq.wqe_cnt - 1)] = wr->wr_id;
+		for (i = 0; i < wr->num_sge; i++)
+			ctrl->msg_length += wr->sg_list[i].length;
+
+
+		ctrl->flag |= ((wr->send_flags & IBV_SEND_SIGNALED) ?
+				HNS_ROCE_WQE_CQ_NOTIFY : 0) |
+			      (wr->send_flags & IBV_SEND_SOLICITED ?
+				HNS_ROCE_WQE_SE : 0) |
+			      ((wr->opcode == IBV_WR_SEND_WITH_IMM ||
+			       wr->opcode == IBV_WR_RDMA_WRITE_WITH_IMM) ?
+				HNS_ROCE_WQE_IMM : 0) |
+			      (wr->send_flags & IBV_SEND_FENCE ?
+				HNS_ROCE_WQE_FENCE : 0);
+
+		if (wr->opcode == IBV_WR_SEND_WITH_IMM ||
+		    wr->opcode == IBV_WR_RDMA_WRITE_WITH_IMM)
+			ctrl->imm_data = wr->imm_data;
+
+		wqe += sizeof(struct hns_roce_wqe_ctrl_seg);
+
+		/* set remote addr segment */
+		switch (ibvqp->qp_type) {
+		case IBV_QPT_RC:
+			switch (wr->opcode) {
+			case IBV_WR_RDMA_READ:
+				ps_opcode = HNS_ROCE_WQE_OPCODE_RDMA_READ;
+				set_raddr_seg(wqe, wr->wr.rdma.remote_addr,
+					      wr->wr.rdma.rkey);
+				break;
+			case IBV_WR_RDMA_WRITE:
+			case IBV_WR_RDMA_WRITE_WITH_IMM:
+				ps_opcode = HNS_ROCE_WQE_OPCODE_RDMA_WRITE;
+				set_raddr_seg(wqe, wr->wr.rdma.remote_addr,
+					      wr->wr.rdma.rkey);
+				break;
+			case IBV_WR_SEND:
+			case IBV_WR_SEND_WITH_IMM:
+				ps_opcode = HNS_ROCE_WQE_OPCODE_SEND;
+				break;
+			case IBV_WR_ATOMIC_CMP_AND_SWP:
+			case IBV_WR_ATOMIC_FETCH_AND_ADD:
+			default:
+				ps_opcode = HNS_ROCE_WQE_OPCODE_MASK;
+				break;
+			}
+			ctrl->flag |= (ps_opcode);
+			wqe  += sizeof(struct hns_roce_wqe_raddr_seg);
+			break;
+		case IBV_QPT_UC:
+		case IBV_QPT_UD:
+		default:
+			break;
+		}
+
+		dseg = wqe;
+
+		/* Inline */
+		if (wr->send_flags & IBV_SEND_INLINE && wr->num_sge) {
+			if (ctrl->msg_length > qp->max_inline_data) {
+				ret = -1;
+				*bad_wr = wr;
+				printf("inline data len(1-32)=%d, send_flags = 0x%x, check failed!\r\n",
+					wr->send_flags, ctrl->msg_length);
+				return ret;
+			}
+
+			for (i = 0; i < wr->num_sge; i++) {
+				memcpy(wqe,
+				     ((void *) (uintptr_t) wr->sg_list[i].addr),
+				     wr->sg_list[i].length);
+				wqe = wqe + wr->sg_list[i].length;
+			}
+
+			ctrl->flag |= HNS_ROCE_WQE_INLINE;
+		} else {
+			/* set sge */
+			for (i = 0; i < wr->num_sge; i++)
+				set_data_seg(dseg+i, wr->sg_list + i);
+
+			ctrl->flag |= wr->num_sge << HNS_ROCE_WQE_SGE_NUM_BIT;
+		}
+
+		ind++;
+	}
+
+out:
+	/* Set DB return */
+	if (likely(nreq)) {
+		qp->sq.head += nreq;
+		wmb();
+
+		hns_roce_update_sq_head(ctx, qp->ibv_qp.qp_num,
+				qp->port_num - 1, qp->sl,
+				qp->sq.head & ((qp->sq.wqe_cnt << 1) - 1));
+	}
+
+	pthread_spin_unlock(&qp->sq.lock);
+
+	return ret;
+}
+
 static void __hns_roce_v1_cq_clean(struct hns_roce_cq *cq, uint32_t qpn,
 				   struct hns_roce_srq *srq)
 {
@@ -517,9 +738,102 @@ static int hns_roce_u_v1_destroy_qp(struct ibv_qp *ibqp)
 	return ret;
 }
 
+static int hns_roce_u_v1_post_recv(struct ibv_qp *ibvqp, struct ibv_recv_wr *wr,
+				   struct ibv_recv_wr **bad_wr)
+{
+	int ret = 0;
+	int nreq;
+	int ind;
+	struct ibv_sge *sg;
+	struct hns_roce_rc_rq_wqe *rq_wqe;
+	struct hns_roce_qp *qp = to_hr_qp(ibvqp);
+	struct hns_roce_context *ctx = to_hr_ctx(ibvqp->context);
+
+	pthread_spin_lock(&qp->rq.lock);
+
+	/* check that state is OK to post receive */
+	ind = qp->rq.head & (qp->rq.wqe_cnt - 1);
+
+	for (nreq = 0; wr; ++nreq, wr = wr->next) {
+		if (hns_roce_wq_overflow(&qp->rq, nreq,
+					 to_hr_cq(qp->ibv_qp.recv_cq))) {
+			ret = -1;
+			*bad_wr = wr;
+			goto out;
+		}
+
+		if (wr->num_sge > qp->rq.max_gs) {
+			ret = -1;
+			*bad_wr = wr;
+			goto out;
+		}
+
+		rq_wqe = get_recv_wqe(qp, ind);
+		if (wr->num_sge > HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM) {
+			ret = -1;
+			*bad_wr = wr;
+			goto out;
+		}
+
+		if (wr->num_sge == HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM) {
+			roce_set_field(rq_wqe->u32_2,
+				       RC_RQ_WQE_NUMBER_OF_DATA_SEG_M,
+				       RC_RQ_WQE_NUMBER_OF_DATA_SEG_S,
+				       HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM);
+			sg = wr->sg_list;
+
+			rq_wqe->va0 = (sg->addr);
+			rq_wqe->l_key0 = (sg->lkey);
+			rq_wqe->length0 = (sg->length);
+
+			sg = wr->sg_list + 1;
+
+			rq_wqe->va1 = (sg->addr);
+			rq_wqe->l_key1 = (sg->lkey);
+			rq_wqe->length1 = (sg->length);
+		} else if (wr->num_sge == HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM - 1) {
+			roce_set_field(rq_wqe->u32_2,
+				       RC_RQ_WQE_NUMBER_OF_DATA_SEG_M,
+				       RC_RQ_WQE_NUMBER_OF_DATA_SEG_S,
+				       HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM - 1);
+			sg = wr->sg_list;
+
+			rq_wqe->va0 = (sg->addr);
+			rq_wqe->l_key0 = (sg->lkey);
+			rq_wqe->length0 = (sg->length);
+
+		} else if (wr->num_sge == HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM - 2) {
+			roce_set_field(rq_wqe->u32_2,
+				       RC_RQ_WQE_NUMBER_OF_DATA_SEG_M,
+				       RC_RQ_WQE_NUMBER_OF_DATA_SEG_S,
+				       HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM - 2);
+		}
+
+		qp->rq.wrid[ind] = wr->wr_id;
+
+		ind = (ind + 1) & (qp->rq.wqe_cnt - 1);
+	}
+
+out:
+	if (nreq) {
+		qp->rq.head += nreq;
+
+		wmb();
+
+		hns_roce_update_rq_head(ctx, qp->ibv_qp.qp_num,
+				    qp->rq.head & ((qp->rq.wqe_cnt << 1) - 1));
+	}
+
+	pthread_spin_unlock(&qp->rq.lock);
+
+	return ret;
+}
+
 struct hns_roce_u_hw hns_roce_u_hw_v1 = {
 	.poll_cq = hns_roce_u_v1_poll_cq,
 	.arm_cq = hns_roce_u_v1_arm_cq,
+	.post_send = hns_roce_u_v1_post_send,
+	.post_recv = hns_roce_u_v1_post_recv,
 	.modify_qp = hns_roce_u_v1_modify_qp,
 	.destroy_qp = hns_roce_u_v1_destroy_qp,
 };
diff --git a/providers/hns/hns_roce_u_hw_v1.h b/providers/hns/hns_roce_u_hw_v1.h
index b249f54..128c66f 100644
--- a/providers/hns/hns_roce_u_hw_v1.h
+++ b/providers/hns/hns_roce_u_hw_v1.h
@@ -39,9 +39,15 @@
 #define HNS_ROCE_CQE_IS_SQ			0
 
 #define HNS_ROCE_RC_WQE_INLINE_DATA_MAX_LEN	32
+#define HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM		2
 
 enum {
+	HNS_ROCE_WQE_INLINE		= 1 << 31,
+	HNS_ROCE_WQE_SE			= 1 << 30,
+	HNS_ROCE_WQE_SGE_NUM_BIT	= 24,
 	HNS_ROCE_WQE_IMM		= 1 << 23,
+	HNS_ROCE_WQE_FENCE		= 1 << 21,
+	HNS_ROCE_WQE_CQ_NOTIFY		 = 1 << 20,
 	HNS_ROCE_WQE_OPCODE_SEND        = 0 << 16,
 	HNS_ROCE_WQE_OPCODE_RDMA_READ   = 1 << 16,
 	HNS_ROCE_WQE_OPCODE_RDMA_WRITE  = 2 << 16,
@@ -52,6 +58,20 @@ enum {
 struct hns_roce_wqe_ctrl_seg {
 	__be32		sgl_pa_h;
 	__be32		flag;
+	__be32		imm_data;
+	__be32		msg_length;
+};
+
+struct hns_roce_wqe_data_seg {
+	__be64		addr;
+	__be32		lkey;
+	__be32		len;
+};
+
+struct hns_roce_wqe_raddr_seg {
+	__be32		rkey;
+	__be32		len;
+	__be64		raddr;
 };
 
 enum {
@@ -102,6 +122,43 @@ struct hns_roce_cq_db {
 
 #define CQ_DB_U32_8_HW_SYNC_S 31
 
+struct hns_roce_rq_db {
+	unsigned int u32_4;
+	unsigned int u32_8;
+};
+
+#define RQ_DB_U32_4_RQ_HEAD_S 0
+#define RQ_DB_U32_4_RQ_HEAD_M   (((1UL << 15) - 1) << RQ_DB_U32_4_RQ_HEAD_S)
+
+#define RQ_DB_U32_8_QPN_S 0
+#define RQ_DB_U32_8_QPN_M   (((1UL << 24) - 1) << RQ_DB_U32_8_QPN_S)
+
+#define RQ_DB_U32_8_CMD_S 28
+#define RQ_DB_U32_8_CMD_M   (((1UL << 3) - 1) << RQ_DB_U32_8_CMD_S)
+
+#define RQ_DB_U32_8_HW_SYNC_S 31
+
+struct hns_roce_sq_db {
+	unsigned int u32_4;
+	unsigned int u32_8;
+};
+
+#define SQ_DB_U32_4_SQ_HEAD_S 0
+#define SQ_DB_U32_4_SQ_HEAD_M (((1UL << 15) - 1) << SQ_DB_U32_4_SQ_HEAD_S)
+
+#define SQ_DB_U32_4_SL_S 16
+#define SQ_DB_U32_4_SL_M (((1UL << 2) - 1) << SQ_DB_U32_4_SL_S)
+
+#define SQ_DB_U32_4_PORT_S 18
+#define SQ_DB_U32_4_PORT_M (((1UL << 3) - 1) << SQ_DB_U32_4_PORT_S)
+
+#define SQ_DB_U32_4_DIRECT_WQE_S 31
+
+#define SQ_DB_U32_8_QPN_S 0
+#define SQ_DB_U32_8_QPN_M (((1UL << 24) - 1) << SQ_DB_U32_8_QPN_S)
+
+#define SQ_DB_U32_8_HW_SYNC 31
+
 struct hns_roce_cqe {
 	unsigned int cqe_byte_4;
 	union {
@@ -160,4 +217,26 @@ struct hns_roce_rc_send_wqe {
 	unsigned int length1;
 };
 
+struct hns_roce_rc_rq_wqe {
+	unsigned int u32_0;
+	unsigned int sgl_ba_31_0;
+	unsigned int u32_2;
+	unsigned int rvd_5;
+	unsigned int rvd_6;
+	unsigned int rvd_7;
+	unsigned int rvd_8;
+	unsigned int rvd_9;
+
+	uint64_t     va0;
+	unsigned int l_key0;
+	unsigned int length0;
+
+	uint64_t     va1;
+	unsigned int l_key1;
+	unsigned int length1;
+};
+#define RC_RQ_WQE_NUMBER_OF_DATA_SEG_S 16
+#define RC_RQ_WQE_NUMBER_OF_DATA_SEG_M \
+	(((1UL << 6) - 1) << RC_RQ_WQE_NUMBER_OF_DATA_SEG_S)
+
 #endif /* _HNS_ROCE_U_HW_V1_H */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 rdma-core 5/7] libhns: Add verbs of qp support
From: Lijun Ou @ 2016-10-29  9:03 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1477731826-10787-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

This patch mainly introduces the relatived qp verbs for userspace
library of hns, include:
    1. create_qp
    2. query_qp
    3. modify_qp
    4. destroy_qp

Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
v2:
- Delete the min() and use the ccan header

v1:
- The initial submit
---
 providers/hns/hns_roce_u.c       |   5 +
 providers/hns/hns_roce_u.h       |  45 +++++++
 providers/hns/hns_roce_u_abi.h   |   8 ++
 providers/hns/hns_roce_u_hw_v1.c | 155 +++++++++++++++++++++++
 providers/hns/hns_roce_u_verbs.c | 259 ++++++++++++++++++++++++++++++++++++++-
 5 files changed, 471 insertions(+), 1 deletion(-)

diff --git a/providers/hns/hns_roce_u.c b/providers/hns/hns_roce_u.c
index e435bea..30f8678 100644
--- a/providers/hns/hns_roce_u.c
+++ b/providers/hns/hns_roce_u.c
@@ -127,6 +127,11 @@ static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
 	context->ibv_ctx.ops.cq_event	   = hns_roce_u_cq_event;
 	context->ibv_ctx.ops.destroy_cq    = hns_roce_u_destroy_cq;
 
+	context->ibv_ctx.ops.create_qp     = hns_roce_u_create_qp;
+	context->ibv_ctx.ops.query_qp	   = hns_roce_u_query_qp;
+	context->ibv_ctx.ops.modify_qp     = hr_dev->u_hw->modify_qp;
+	context->ibv_ctx.ops.destroy_qp    = hr_dev->u_hw->destroy_qp;
+
 	if (hns_roce_u_query_device(&context->ibv_ctx, &dev_attrs))
 		goto tptr_free;
 
diff --git a/providers/hns/hns_roce_u.h b/providers/hns/hns_roce_u.h
index c3e364d..02b9251 100644
--- a/providers/hns/hns_roce_u.h
+++ b/providers/hns/hns_roce_u.h
@@ -44,6 +44,7 @@
 
 #define HNS_ROCE_MAX_CQ_NUM		0x10000
 #define HNS_ROCE_MIN_CQE_NUM		0x40
+#define HNS_ROCE_MIN_WQE_NUM		0x20
 #define HNS_ROCE_CQ_DB_BUF_SIZE		((HNS_ROCE_MAX_CQ_NUM >> 11) << 12)
 #define HNS_ROCE_TPTR_OFFSET		0x1000
 #define HNS_ROCE_HW_VER1		('h' << 24 | 'i' << 16 | '0' << 8 | '6')
@@ -128,10 +129,29 @@ struct hns_roce_cq {
 	int				arm_sn;
 };
 
+struct hns_roce_srq {
+	struct ibv_srq			ibv_srq;
+	struct hns_roce_buf		buf;
+	pthread_spinlock_t		lock;
+	unsigned long			*wrid;
+	unsigned int			srqn;
+	int				max;
+	unsigned int			max_gs;
+	int				wqe_shift;
+	int				head;
+	int				tail;
+	unsigned int			*db;
+	unsigned short			counter;
+};
+
 struct hns_roce_wq {
 	unsigned long			*wrid;
+	pthread_spinlock_t		lock;
 	unsigned int			wqe_cnt;
+	int				max_post;
+	unsigned int			head;
 	unsigned int			tail;
+	unsigned int			max_gs;
 	int				wqe_shift;
 	int				offset;
 };
@@ -139,14 +159,21 @@ struct hns_roce_wq {
 struct hns_roce_qp {
 	struct ibv_qp			ibv_qp;
 	struct hns_roce_buf		buf;
+	int				max_inline_data;
+	int				buf_size;
 	unsigned int			sq_signal_bits;
 	struct hns_roce_wq		sq;
 	struct hns_roce_wq		rq;
+	int				port_num;
+	int				sl;
 };
 
 struct hns_roce_u_hw {
 	int (*poll_cq)(struct ibv_cq *ibvcq, int ne, struct ibv_wc *wc);
 	int (*arm_cq)(struct ibv_cq *ibvcq, int solicited);
+	int (*modify_qp)(struct ibv_qp *qp, struct ibv_qp_attr *attr,
+			 int attr_mask);
+	int (*destroy_qp)(struct ibv_qp *ibqp);
 };
 
 static inline unsigned long align(unsigned long val, unsigned long align)
@@ -174,6 +201,16 @@ static inline struct hns_roce_cq *to_hr_cq(struct ibv_cq *ibv_cq)
 	return container_of(ibv_cq, struct hns_roce_cq, ibv_cq);
 }
 
+static inline struct hns_roce_srq *to_hr_srq(struct ibv_srq *ibv_srq)
+{
+	return container_of(ibv_srq, struct hns_roce_srq, ibv_srq);
+}
+
+static inline struct  hns_roce_qp *to_hr_qp(struct ibv_qp *ibv_qp)
+{
+	return container_of(ibv_qp, struct hns_roce_qp, ibv_qp);
+}
+
 int hns_roce_u_query_device(struct ibv_context *context,
 			    struct ibv_device_attr *attr);
 int hns_roce_u_query_port(struct ibv_context *context, uint8_t port,
@@ -193,10 +230,18 @@ struct ibv_cq *hns_roce_u_create_cq(struct ibv_context *context, int cqe,
 int hns_roce_u_destroy_cq(struct ibv_cq *cq);
 void hns_roce_u_cq_event(struct ibv_cq *cq);
 
+struct ibv_qp *hns_roce_u_create_qp(struct ibv_pd *pd,
+				    struct ibv_qp_init_attr *attr);
+
+int hns_roce_u_query_qp(struct ibv_qp *ibqp, struct ibv_qp_attr *attr,
+			int attr_mask, struct ibv_qp_init_attr *init_attr);
+
 int hns_roce_alloc_buf(struct hns_roce_buf *buf, unsigned int size,
 		       int page_size);
 void hns_roce_free_buf(struct hns_roce_buf *buf);
 
+void hns_roce_init_qp_indices(struct hns_roce_qp *qp);
+
 extern struct hns_roce_u_hw hns_roce_u_hw_v1;
 
 #endif /* _HNS_ROCE_U_H */
diff --git a/providers/hns/hns_roce_u_abi.h b/providers/hns/hns_roce_u_abi.h
index 1e62a7e..e78f967 100644
--- a/providers/hns/hns_roce_u_abi.h
+++ b/providers/hns/hns_roce_u_abi.h
@@ -58,4 +58,12 @@ struct hns_roce_create_cq_resp {
 	__u32				reserved;
 };
 
+struct hns_roce_create_qp {
+	struct ibv_create_qp		ibv_cmd;
+	__u64				buf_addr;
+	__u8				log_sq_bb_count;
+	__u8				log_sq_stride;
+	__u8				reserved[5];
+};
+
 #endif /* _HNS_ROCE_U_ABI_H */
diff --git a/providers/hns/hns_roce_u_hw_v1.c b/providers/hns/hns_roce_u_hw_v1.c
index 2676021..fb81634 100644
--- a/providers/hns/hns_roce_u_hw_v1.c
+++ b/providers/hns/hns_roce_u_hw_v1.c
@@ -150,6 +150,16 @@ static struct hns_roce_qp *hns_roce_find_qp(struct hns_roce_context *ctx,
 	}
 }
 
+static void hns_roce_clear_qp(struct hns_roce_context *ctx, uint32_t qpn)
+{
+	int tind = (qpn & (ctx->num_qps - 1)) >> ctx->qp_table_shift;
+
+	if (!--ctx->qp_table[tind].refcnt)
+		free(ctx->qp_table[tind].table);
+	else
+		ctx->qp_table[tind].table[qpn & ctx->qp_table_mask] = NULL;
+}
+
 static int hns_roce_v1_poll_one(struct hns_roce_cq *cq,
 				struct hns_roce_qp **cur_qp, struct ibv_wc *wc)
 {
@@ -364,7 +374,152 @@ static int hns_roce_u_v1_arm_cq(struct ibv_cq *ibvcq, int solicited)
 	return 0;
 }
 
+static void __hns_roce_v1_cq_clean(struct hns_roce_cq *cq, uint32_t qpn,
+				   struct hns_roce_srq *srq)
+{
+	int nfreed = 0;
+	uint32_t prod_index;
+	uint8_t owner_bit = 0;
+	struct hns_roce_cqe *cqe, *dest;
+	struct hns_roce_context *ctx = to_hr_ctx(cq->ibv_cq.context);
+
+	for (prod_index = cq->cons_index; get_sw_cqe(cq, prod_index);
+	     ++prod_index)
+		if (prod_index == cq->cons_index + cq->ibv_cq.cqe)
+			break;
+
+	while ((int) --prod_index - (int) cq->cons_index >= 0) {
+		cqe = get_cqe(cq, prod_index & cq->ibv_cq.cqe);
+		if ((roce_get_field(cqe->cqe_byte_16, CQE_BYTE_16_LOCAL_QPN_M,
+			      CQE_BYTE_16_LOCAL_QPN_S) & 0xffffff) == qpn) {
+			++nfreed;
+		} else if (nfreed) {
+			dest = get_cqe(cq,
+				       (prod_index + nfreed) & cq->ibv_cq.cqe);
+			owner_bit = roce_get_bit(dest->cqe_byte_4,
+						 CQE_BYTE_4_OWNER_S);
+			memcpy(dest, cqe, sizeof(*cqe));
+			roce_set_bit(dest->cqe_byte_4, CQE_BYTE_4_OWNER_S,
+				     owner_bit);
+		}
+	}
+
+	if (nfreed) {
+		cq->cons_index += nfreed;
+		wmb();
+		hns_roce_update_cq_cons_index(ctx, cq);
+	}
+}
+
+static void hns_roce_v1_cq_clean(struct hns_roce_cq *cq, unsigned int qpn,
+				 struct hns_roce_srq *srq)
+{
+	pthread_spin_lock(&cq->lock);
+	__hns_roce_v1_cq_clean(cq, qpn, srq);
+	pthread_spin_unlock(&cq->lock);
+}
+
+static int hns_roce_u_v1_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr,
+				   int attr_mask)
+{
+	int ret;
+	struct ibv_modify_qp cmd;
+	struct hns_roce_qp *hr_qp = to_hr_qp(qp);
+
+	ret = ibv_cmd_modify_qp(qp, attr, attr_mask, &cmd, sizeof(cmd));
+
+	if (!ret && (attr_mask & IBV_QP_STATE) &&
+	    attr->qp_state == IBV_QPS_RESET) {
+		hns_roce_v1_cq_clean(to_hr_cq(qp->recv_cq), qp->qp_num,
+				     qp->srq ? to_hr_srq(qp->srq) : NULL);
+		if (qp->send_cq != qp->recv_cq)
+			hns_roce_v1_cq_clean(to_hr_cq(qp->send_cq), qp->qp_num,
+					     NULL);
+
+		hns_roce_init_qp_indices(to_hr_qp(qp));
+	}
+
+	if (!ret && (attr_mask & IBV_QP_PORT)) {
+		hr_qp->port_num = attr->port_num;
+		printf("hr_qp->port_num= 0x%x\n", hr_qp->port_num);
+	}
+
+	hr_qp->sl = attr->ah_attr.sl;
+
+	return ret;
+}
+
+static void hns_roce_lock_cqs(struct ibv_qp *qp)
+{
+	struct hns_roce_cq *send_cq = to_hr_cq(qp->send_cq);
+	struct hns_roce_cq *recv_cq = to_hr_cq(qp->recv_cq);
+
+	if (send_cq == recv_cq) {
+		pthread_spin_lock(&send_cq->lock);
+	} else if (send_cq->cqn < recv_cq->cqn) {
+		pthread_spin_lock(&send_cq->lock);
+		pthread_spin_lock(&recv_cq->lock);
+	} else {
+		pthread_spin_lock(&recv_cq->lock);
+		pthread_spin_lock(&send_cq->lock);
+	}
+}
+
+static void hns_roce_unlock_cqs(struct ibv_qp *qp)
+{
+	struct hns_roce_cq *send_cq = to_hr_cq(qp->send_cq);
+	struct hns_roce_cq *recv_cq = to_hr_cq(qp->recv_cq);
+
+	if (send_cq == recv_cq) {
+		pthread_spin_unlock(&send_cq->lock);
+	} else if (send_cq->cqn < recv_cq->cqn) {
+		pthread_spin_unlock(&recv_cq->lock);
+		pthread_spin_unlock(&send_cq->lock);
+	} else {
+		pthread_spin_unlock(&send_cq->lock);
+		pthread_spin_unlock(&recv_cq->lock);
+	}
+}
+
+static int hns_roce_u_v1_destroy_qp(struct ibv_qp *ibqp)
+{
+	int ret;
+	struct hns_roce_qp *qp = to_hr_qp(ibqp);
+
+	pthread_mutex_lock(&to_hr_ctx(ibqp->context)->qp_table_mutex);
+	ret = ibv_cmd_destroy_qp(ibqp);
+	if (ret) {
+		pthread_mutex_unlock(&to_hr_ctx(ibqp->context)->qp_table_mutex);
+		return ret;
+	}
+
+	hns_roce_lock_cqs(ibqp);
+
+	__hns_roce_v1_cq_clean(to_hr_cq(ibqp->recv_cq), ibqp->qp_num,
+			       ibqp->srq ? to_hr_srq(ibqp->srq) : NULL);
+
+	if (ibqp->send_cq != ibqp->recv_cq)
+		__hns_roce_v1_cq_clean(to_hr_cq(ibqp->send_cq), ibqp->qp_num,
+				       NULL);
+
+	hns_roce_clear_qp(to_hr_ctx(ibqp->context), ibqp->qp_num);
+
+	hns_roce_unlock_cqs(ibqp);
+	pthread_mutex_unlock(&to_hr_ctx(ibqp->context)->qp_table_mutex);
+
+	free(qp->sq.wrid);
+	if (qp->rq.wqe_cnt)
+		free(qp->rq.wrid);
+
+	hns_roce_free_buf(&qp->buf);
+	free(qp);
+
+	return ret;
+}
+
 struct hns_roce_u_hw hns_roce_u_hw_v1 = {
 	.poll_cq = hns_roce_u_v1_poll_cq,
 	.arm_cq = hns_roce_u_v1_arm_cq,
+	.modify_qp = hns_roce_u_v1_modify_qp,
+	.destroy_qp = hns_roce_u_v1_destroy_qp,
 };
diff --git a/providers/hns/hns_roce_u_verbs.c b/providers/hns/hns_roce_u_verbs.c
index 077cddc..1615f2e 100644
--- a/providers/hns/hns_roce_u_verbs.c
+++ b/providers/hns/hns_roce_u_verbs.c
@@ -38,11 +38,19 @@
 #include <sys/mman.h>
 #include <fcntl.h>
 #include <unistd.h>
-
+#include <ccan/minmax.h>
 #include "hns_roce_u.h"
 #include "hns_roce_u_abi.h"
 #include "hns_roce_u_hw_v1.h"
 
+void hns_roce_init_qp_indices(struct hns_roce_qp *qp)
+{
+	qp->sq.head = 0;
+	qp->sq.tail = 0;
+	qp->rq.head = 0;
+	qp->rq.tail = 0;
+}
+
 int hns_roce_u_query_device(struct ibv_context *context,
 			    struct ibv_device_attr *attr)
 {
@@ -163,6 +171,29 @@ static int align_cq_size(int req)
 	return nent;
 }
 
+static int align_qp_size(int req)
+{
+	int nent;
+
+	for (nent = HNS_ROCE_MIN_WQE_NUM; nent < req; nent <<= 1)
+		;
+
+	return nent;
+}
+
+static void hns_roce_set_sq_sizes(struct hns_roce_qp *qp,
+				  struct ibv_qp_cap *cap, enum ibv_qp_type type)
+{
+	struct hns_roce_context *ctx = to_hr_ctx(qp->ibv_qp.context);
+
+	qp->sq.max_gs = 2;
+	cap->max_send_sge = min(ctx->max_sge, qp->sq.max_gs);
+	qp->sq.max_post = min(ctx->max_qp_wr, qp->sq.wqe_cnt);
+	cap->max_send_wr = qp->sq.max_post;
+	qp->max_inline_data  = 32;
+	cap->max_inline_data = qp->max_inline_data;
+}
+
 static int hns_roce_verify_cq(int *cqe, struct hns_roce_context *context)
 {
 	if (*cqe < HNS_ROCE_MIN_CQE_NUM) {
@@ -189,6 +220,17 @@ static int hns_roce_alloc_cq_buf(struct hns_roce_device *dev,
 	return 0;
 }
 
+static void hns_roce_calc_sq_wqe_size(struct ibv_qp_cap *cap,
+				      enum ibv_qp_type type,
+				      struct hns_roce_qp *qp)
+{
+	int size = sizeof(struct hns_roce_rc_send_wqe);
+
+	for (qp->sq.wqe_shift = 6; 1 << qp->sq.wqe_shift < size;
+	     qp->sq.wqe_shift++)
+		;
+}
+
 struct ibv_cq *hns_roce_u_create_cq(struct ibv_context *context, int cqe,
 				    struct ibv_comp_channel *channel,
 				    int comp_vector)
@@ -266,3 +308,218 @@ int hns_roce_u_destroy_cq(struct ibv_cq *cq)
 
 	return ret;
 }
+
+static int hns_roce_verify_qp(struct ibv_qp_init_attr *attr,
+			      struct hns_roce_context *context)
+{
+	if (attr->cap.max_send_wr < HNS_ROCE_MIN_WQE_NUM) {
+		fprintf(stderr,
+			"max_send_wr = %d, less than minimum WQE number.\n",
+			attr->cap.max_send_wr);
+		attr->cap.max_send_wr = HNS_ROCE_MIN_WQE_NUM;
+	}
+
+	if (attr->cap.max_recv_wr < HNS_ROCE_MIN_WQE_NUM) {
+		fprintf(stderr,
+			"max_recv_wr = %d, less than minimum WQE number.\n",
+			attr->cap.max_recv_wr);
+		attr->cap.max_recv_wr = HNS_ROCE_MIN_WQE_NUM;
+	}
+
+	if (attr->cap.max_recv_sge < 1)
+		attr->cap.max_recv_sge = 1;
+	if (attr->cap.max_send_wr > context->max_qp_wr ||
+	    attr->cap.max_recv_wr > context->max_qp_wr ||
+	    attr->cap.max_send_sge > context->max_sge  ||
+	    attr->cap.max_recv_sge > context->max_sge)
+		return -1;
+
+	if ((attr->qp_type != IBV_QPT_RC) && (attr->qp_type != IBV_QPT_UD))
+		return -1;
+
+	if ((attr->qp_type == IBV_QPT_RC) &&
+	    (attr->cap.max_inline_data > HNS_ROCE_RC_WQE_INLINE_DATA_MAX_LEN))
+		return -1;
+
+	if (attr->qp_type == IBV_QPT_UC)
+		return -1;
+
+	return 0;
+}
+
+static int hns_roce_alloc_qp_buf(struct ibv_pd *pd, struct ibv_qp_cap *cap,
+				 enum ibv_qp_type type, struct hns_roce_qp *qp)
+{
+	qp->sq.wrid =
+		(unsigned long *)malloc(qp->sq.wqe_cnt * sizeof(uint64_t));
+	if (!qp->sq.wrid)
+		return -1;
+
+	if (qp->rq.wqe_cnt) {
+		qp->rq.wrid = malloc(qp->rq.wqe_cnt * sizeof(uint64_t));
+		if (!qp->rq.wrid) {
+			free(qp->sq.wrid);
+			return -1;
+		}
+	}
+
+	for (qp->rq.wqe_shift = 4;
+	     1 << qp->rq.wqe_shift < sizeof(struct hns_roce_rc_send_wqe);
+	     qp->rq.wqe_shift++)
+		;
+
+	qp->buf_size = align((qp->sq.wqe_cnt << qp->sq.wqe_shift), 0x1000) +
+		      (qp->rq.wqe_cnt << qp->rq.wqe_shift);
+
+	if (qp->rq.wqe_shift > qp->sq.wqe_shift) {
+		qp->rq.offset = 0;
+		qp->sq.offset = qp->rq.wqe_cnt << qp->rq.wqe_shift;
+	} else {
+		qp->rq.offset = align((qp->sq.wqe_cnt << qp->sq.wqe_shift),
+				       0x1000);
+		qp->sq.offset = 0;
+	}
+
+	if (hns_roce_alloc_buf(&qp->buf, align(qp->buf_size, 0x1000),
+			       to_hr_dev(pd->context->device)->page_size)) {
+		free(qp->sq.wrid);
+		free(qp->rq.wrid);
+		return -1;
+	}
+
+	memset(qp->buf.buf, 0, qp->buf_size);
+
+	return 0;
+}
+
+static int hns_roce_store_qp(struct hns_roce_context *ctx, uint32_t qpn,
+			     struct hns_roce_qp *qp)
+{
+	int tind = (qpn & (ctx->num_qps - 1)) >> ctx->qp_table_shift;
+
+	if (!ctx->qp_table[tind].refcnt) {
+		ctx->qp_table[tind].table = calloc(ctx->qp_table_mask + 1,
+						  sizeof(struct hns_roce_qp *));
+		if (!ctx->qp_table[tind].table)
+			return -1;
+	}
+
+	++ctx->qp_table[tind].refcnt;
+	ctx->qp_table[tind].table[qpn & ctx->qp_table_mask] = qp;
+
+	return 0;
+}
+
+struct ibv_qp *hns_roce_u_create_qp(struct ibv_pd *pd,
+				    struct ibv_qp_init_attr *attr)
+{
+	int ret;
+	struct hns_roce_qp *qp = NULL;
+	struct hns_roce_create_qp cmd;
+	struct ibv_create_qp_resp resp;
+	struct hns_roce_context *context = to_hr_ctx(pd->context);
+
+	if (hns_roce_verify_qp(attr, context)) {
+		fprintf(stderr, "hns_roce_verify_sizes failed!\n");
+		return NULL;
+	}
+
+	qp = malloc(sizeof(*qp));
+	if (!qp) {
+		fprintf(stderr, "malloc failed!\n");
+		return NULL;
+	}
+
+	hns_roce_calc_sq_wqe_size(&attr->cap, attr->qp_type, qp);
+	qp->sq.wqe_cnt = align_qp_size(attr->cap.max_send_wr);
+	qp->rq.wqe_cnt = align_qp_size(attr->cap.max_recv_wr);
+
+	if (hns_roce_alloc_qp_buf(pd, &attr->cap, attr->qp_type, qp)) {
+		fprintf(stderr, "hns_roce_alloc_qp_buf failed!\n");
+		goto err;
+	}
+
+	hns_roce_init_qp_indices(qp);
+
+	if (pthread_spin_init(&qp->sq.lock, PTHREAD_PROCESS_PRIVATE) ||
+	    pthread_spin_init(&qp->rq.lock, PTHREAD_PROCESS_PRIVATE)) {
+		fprintf(stderr, "pthread_spin_init failed!\n");
+		goto err_free;
+	}
+
+	cmd.buf_addr = (uintptr_t) qp->buf.buf;
+	cmd.log_sq_stride = qp->sq.wqe_shift;
+	for (cmd.log_sq_bb_count = 0; qp->sq.wqe_cnt > 1 << cmd.log_sq_bb_count;
+	     ++cmd.log_sq_bb_count)
+		;
+
+	memset(cmd.reserved, 0, sizeof(cmd.reserved));
+
+	pthread_mutex_lock(&to_hr_ctx(pd->context)->qp_table_mutex);
+
+	ret = ibv_cmd_create_qp(pd, &qp->ibv_qp, attr, &cmd.ibv_cmd,
+				sizeof(cmd), &resp, sizeof(resp));
+	if (ret) {
+		fprintf(stderr, "ibv_cmd_create_qp failed!\n");
+		goto err_rq_db;
+	}
+
+	ret = hns_roce_store_qp(to_hr_ctx(pd->context), qp->ibv_qp.qp_num, qp);
+	if (ret) {
+		fprintf(stderr, "hns_roce_store_qp failed!\n");
+		goto err_destroy;
+	}
+	pthread_mutex_unlock(&to_hr_ctx(pd->context)->qp_table_mutex);
+
+	qp->rq.wqe_cnt = attr->cap.max_recv_wr;
+	qp->rq.max_gs	= attr->cap.max_recv_sge;
+
+	/* adjust rq maxima to not exceed reported device maxima */
+	attr->cap.max_recv_wr = min(context->max_qp_wr, attr->cap.max_recv_wr);
+	attr->cap.max_recv_sge = min(context->max_sge, attr->cap.max_recv_sge);
+
+	qp->rq.max_post = attr->cap.max_recv_wr;
+	hns_roce_set_sq_sizes(qp, &attr->cap, attr->qp_type);
+
+	qp->sq_signal_bits = attr->sq_sig_all ? 0 : 1;
+
+	return &qp->ibv_qp;
+
+err_destroy:
+	ibv_cmd_destroy_qp(&qp->ibv_qp);
+
+err_rq_db:
+	pthread_mutex_unlock(&to_hr_ctx(pd->context)->qp_table_mutex);
+
+err_free:
+	free(qp->sq.wrid);
+	if (qp->rq.wqe_cnt)
+		free(qp->rq.wrid);
+	hns_roce_free_buf(&qp->buf);
+
+err:
+	free(qp);
+
+	return NULL;
+}
+
+int hns_roce_u_query_qp(struct ibv_qp *ibqp, struct ibv_qp_attr *attr,
+			int attr_mask, struct ibv_qp_init_attr *init_attr)
+{
+	int ret;
+	struct ibv_query_qp cmd;
+	struct hns_roce_qp *qp = to_hr_qp(ibqp);
+
+	ret = ibv_cmd_query_qp(ibqp, attr, attr_mask, init_attr, &cmd,
+			       sizeof(cmd));
+	if (ret)
+		return ret;
+
+	init_attr->cap.max_send_wr = qp->sq.max_post;
+	init_attr->cap.max_send_sge = qp->sq.max_gs;
+	init_attr->cap.max_inline_data = qp->max_inline_data;
+
+	attr->cap = init_attr->cap;
+
+	return ret;
+}
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 rdma-core 4/7] libhns: Add verbs of cq support
From: Lijun Ou @ 2016-10-29  9:03 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1477731826-10787-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

This patch mainly introduces the relatived cq verbs for userspace
of hns, include:
    1. create_cq
    2. poll_cq
    3. req_notify_cq
    4. cq_event
    5. destroy_cq

Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
v2:
- Delete the unused code

v1:
- The initial submit
---
 providers/hns/hns_roce_u.c       |  57 +++++-
 providers/hns/hns_roce_u.h       |  94 ++++++++++
 providers/hns/hns_roce_u_abi.h   |  12 ++
 providers/hns/hns_roce_u_buf.c   |  61 +++++++
 providers/hns/hns_roce_u_db.h    |  54 ++++++
 providers/hns/hns_roce_u_hw_v1.c | 370 +++++++++++++++++++++++++++++++++++++++
 providers/hns/hns_roce_u_hw_v1.h | 163 +++++++++++++++++
 providers/hns/hns_roce_u_verbs.c | 116 ++++++++++++
 8 files changed, 922 insertions(+), 5 deletions(-)
 create mode 100644 providers/hns/hns_roce_u_buf.c
 create mode 100644 providers/hns/hns_roce_u_db.h
 create mode 100644 providers/hns/hns_roce_u_hw_v1.c
 create mode 100644 providers/hns/hns_roce_u_hw_v1.h

diff --git a/providers/hns/hns_roce_u.c b/providers/hns/hns_roce_u.c
index 53e2720..e435bea 100644
--- a/providers/hns/hns_roce_u.c
+++ b/providers/hns/hns_roce_u.c
@@ -46,15 +46,19 @@
 
 static const struct {
 	char	 hid[HID_LEN];
+	void	 *data;
+	int	 version;
 } acpi_table[] = {
-	{"acpi:HISI00D1:"},
-	{},
+	 {"acpi:HISI00D1:", &hns_roce_u_hw_v1, HNS_ROCE_HW_VER1},
+	 {},
 };
 
 static const struct {
 	char	 compatible[DEV_MATCH_LEN];
+	void	 *data;
+	int	 version;
 } dt_table[] = {
-	{"hisilicon,hns-roce-v1"},
+	{"hisilicon,hns-roce-v1", &hns_roce_u_hw_v1, HNS_ROCE_HW_VER1},
 	{},
 };
 
@@ -93,6 +97,21 @@ static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
 		goto err_free;
 	}
 
+	if (hr_dev->hw_version == HNS_ROCE_HW_VER1) {
+		/*
+		 * when vma->vm_pgoff is 1, the cq_tptr_base includes 64K CQ,
+		 * a pointer of CQ need 2B size
+		 */
+		context->cq_tptr_base = mmap(NULL, HNS_ROCE_CQ_DB_BUF_SIZE,
+					     PROT_READ | PROT_WRITE, MAP_SHARED,
+					     cmd_fd, HNS_ROCE_TPTR_OFFSET);
+		if (context->cq_tptr_base == MAP_FAILED) {
+			fprintf(stderr,
+				PFX "Warning: Failed to mmap cq_tptr page.\n");
+			goto db_free;
+		}
+	}
+
 	pthread_spin_init(&context->uar_lock, PTHREAD_PROCESS_PRIVATE);
 
 	context->ibv_ctx.ops.query_device  = hns_roce_u_query_device;
@@ -102,6 +121,12 @@ static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
 	context->ibv_ctx.ops.reg_mr	   = hns_roce_u_reg_mr;
 	context->ibv_ctx.ops.dereg_mr	   = hns_roce_u_dereg_mr;
 
+	context->ibv_ctx.ops.create_cq     = hns_roce_u_create_cq;
+	context->ibv_ctx.ops.poll_cq	   = hr_dev->u_hw->poll_cq;
+	context->ibv_ctx.ops.req_notify_cq = hr_dev->u_hw->arm_cq;
+	context->ibv_ctx.ops.cq_event	   = hns_roce_u_cq_event;
+	context->ibv_ctx.ops.destroy_cq    = hns_roce_u_destroy_cq;
+
 	if (hns_roce_u_query_device(&context->ibv_ctx, &dev_attrs))
 		goto tptr_free;
 
@@ -112,6 +137,16 @@ static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
 	return &context->ibv_ctx;
 
 tptr_free:
+	if (hr_dev->hw_version == HNS_ROCE_HW_VER1) {
+		if (munmap(context->cq_tptr_base, HNS_ROCE_CQ_DB_BUF_SIZE))
+			fprintf(stderr, PFX "Warning: Munmap tptr failed.\n");
+			context->cq_tptr_base = NULL;
+	}
+
+db_free:
+	munmap(context->uar, to_hr_dev(ibdev)->page_size);
+	context->uar = NULL;
+
 err_free:
 	free(context);
 	return NULL;
@@ -122,6 +157,8 @@ static void hns_roce_free_context(struct ibv_context *ibctx)
 	struct hns_roce_context *context = to_hr_ctx(ibctx);
 
 	munmap(context->uar, to_hr_dev(ibctx->device)->page_size);
+	if (to_hr_dev(ibctx->device)->hw_version == HNS_ROCE_HW_VER1)
+		munmap(context->cq_tptr_base, HNS_ROCE_CQ_DB_BUF_SIZE);
 
 	context->uar = NULL;
 
@@ -140,18 +177,26 @@ static struct ibv_device *hns_roce_driver_init(const char *uverbs_sys_path,
 	struct hns_roce_device  *dev;
 	char			 value[128];
 	int			 i;
+	void			 *u_hw;
+	int			 hw_version;
 
 	if (ibv_read_sysfs_file(uverbs_sys_path, "device/modalias",
 				value, sizeof(value)) > 0)
 		for (i = 0; i < sizeof(acpi_table) / sizeof(acpi_table[0]); ++i)
-			if (!strcmp(value, acpi_table[i].hid))
+			if (!strcmp(value, acpi_table[i].hid)) {
+				u_hw = acpi_table[i].data;
+				hw_version = acpi_table[i].version;
 				goto found;
+			}
 
 	if (ibv_read_sysfs_file(uverbs_sys_path, "device/of_node/compatible",
 				value, sizeof(value)) > 0)
 		for (i = 0; i < sizeof(dt_table) / sizeof(dt_table[0]); ++i)
-			if (!strcmp(value, dt_table[i].compatible))
+			if (!strcmp(value, dt_table[i].compatible)) {
+				u_hw = dt_table[i].data;
+				hw_version = dt_table[i].version;
 				goto found;
+			}
 
 	return NULL;
 
@@ -164,6 +209,8 @@ found:
 	}
 
 	dev->ibv_dev.ops = hns_roce_dev_ops;
+	dev->u_hw = (struct hns_roce_u_hw *)u_hw;
+	dev->hw_version = hw_version;
 	dev->page_size   = sysconf(_SC_PAGESIZE);
 	return &dev->ibv_dev;
 }
diff --git a/providers/hns/hns_roce_u.h b/providers/hns/hns_roce_u.h
index 5b73794..c3e364d 100644
--- a/providers/hns/hns_roce_u.h
+++ b/providers/hns/hns_roce_u.h
@@ -40,18 +40,53 @@
 #include <infiniband/verbs.h>
 #include <ccan/container_of.h>
 
+#define HNS_ROCE_CQE_ENTRY_SIZE		0x20
+
+#define HNS_ROCE_MAX_CQ_NUM		0x10000
+#define HNS_ROCE_MIN_CQE_NUM		0x40
+#define HNS_ROCE_CQ_DB_BUF_SIZE		((HNS_ROCE_MAX_CQ_NUM >> 11) << 12)
+#define HNS_ROCE_TPTR_OFFSET		0x1000
 #define HNS_ROCE_HW_VER1		('h' << 24 | 'i' << 16 | '0' << 8 | '6')
 
 #define PFX				"hns: "
 
+#define roce_get_field(origin, mask, shift) \
+	(((origin) & (mask)) >> (shift))
+
+#define roce_get_bit(origin, shift) \
+	roce_get_field((origin), (1ul << (shift)), (shift))
+
+#define roce_set_field(origin, mask, shift, val) \
+	do { \
+		(origin) &= (~(mask)); \
+		(origin) |= (((unsigned int)(val) << (shift)) & (mask)); \
+	} while (0)
+
+#define roce_set_bit(origin, shift, val) \
+	roce_set_field((origin), (1ul << (shift)), (shift), (val))
+
 enum {
 	HNS_ROCE_QP_TABLE_BITS		= 8,
 	HNS_ROCE_QP_TABLE_SIZE		= 1 << HNS_ROCE_QP_TABLE_BITS,
 };
 
+/* operation type list */
+enum {
+	/* rq&srq operation */
+	HNS_ROCE_OPCODE_SEND_DATA_RECEIVE         = 0x06,
+	HNS_ROCE_OPCODE_RDMA_WITH_IMM_RECEIVE     = 0x07,
+};
+
 struct hns_roce_device {
 	struct ibv_device		ibv_dev;
 	int				page_size;
+	struct hns_roce_u_hw		*u_hw;
+	int				hw_version;
+};
+
+struct hns_roce_buf {
+	void				*buf;
+	unsigned int			length;
 };
 
 struct hns_roce_context {
@@ -59,7 +94,10 @@ struct hns_roce_context {
 	void				*uar;
 	pthread_spinlock_t		uar_lock;
 
+	void				*cq_tptr_base;
+
 	struct {
+		struct hns_roce_qp	**table;
 		int			refcnt;
 	} qp_table[HNS_ROCE_QP_TABLE_SIZE];
 
@@ -78,6 +116,44 @@ struct hns_roce_pd {
 	unsigned int			pdn;
 };
 
+struct hns_roce_cq {
+	struct ibv_cq			ibv_cq;
+	struct hns_roce_buf		buf;
+	pthread_spinlock_t		lock;
+	unsigned int			cqn;
+	unsigned int			cq_depth;
+	unsigned int			cons_index;
+	unsigned int			*set_ci_db;
+	unsigned int			*arm_db;
+	int				arm_sn;
+};
+
+struct hns_roce_wq {
+	unsigned long			*wrid;
+	unsigned int			wqe_cnt;
+	unsigned int			tail;
+	int				wqe_shift;
+	int				offset;
+};
+
+struct hns_roce_qp {
+	struct ibv_qp			ibv_qp;
+	struct hns_roce_buf		buf;
+	unsigned int			sq_signal_bits;
+	struct hns_roce_wq		sq;
+	struct hns_roce_wq		rq;
+};
+
+struct hns_roce_u_hw {
+	int (*poll_cq)(struct ibv_cq *ibvcq, int ne, struct ibv_wc *wc);
+	int (*arm_cq)(struct ibv_cq *ibvcq, int solicited);
+};
+
+static inline unsigned long align(unsigned long val, unsigned long align)
+{
+	return (val + align - 1) & ~(align - 1);
+}
+
 static inline struct hns_roce_device *to_hr_dev(struct ibv_device *ibv_dev)
 {
 	return container_of(ibv_dev, struct hns_roce_device, ibv_dev);
@@ -93,6 +169,11 @@ static inline struct hns_roce_pd *to_hr_pd(struct ibv_pd *ibv_pd)
 	return container_of(ibv_pd, struct hns_roce_pd, ibv_pd);
 }
 
+static inline struct hns_roce_cq *to_hr_cq(struct ibv_cq *ibv_cq)
+{
+	return container_of(ibv_cq, struct hns_roce_cq, ibv_cq);
+}
+
 int hns_roce_u_query_device(struct ibv_context *context,
 			    struct ibv_device_attr *attr);
 int hns_roce_u_query_port(struct ibv_context *context, uint8_t port,
@@ -105,4 +186,17 @@ struct ibv_mr *hns_roce_u_reg_mr(struct ibv_pd *pd, void *addr, size_t length,
 				 int access);
 int hns_roce_u_dereg_mr(struct ibv_mr *mr);
 
+struct ibv_cq *hns_roce_u_create_cq(struct ibv_context *context, int cqe,
+				    struct ibv_comp_channel *channel,
+				    int comp_vector);
+
+int hns_roce_u_destroy_cq(struct ibv_cq *cq);
+void hns_roce_u_cq_event(struct ibv_cq *cq);
+
+int hns_roce_alloc_buf(struct hns_roce_buf *buf, unsigned int size,
+		       int page_size);
+void hns_roce_free_buf(struct hns_roce_buf *buf);
+
+extern struct hns_roce_u_hw hns_roce_u_hw_v1;
+
 #endif /* _HNS_ROCE_U_H */
diff --git a/providers/hns/hns_roce_u_abi.h b/providers/hns/hns_roce_u_abi.h
index 0a0cd0c..1e62a7e 100644
--- a/providers/hns/hns_roce_u_abi.h
+++ b/providers/hns/hns_roce_u_abi.h
@@ -46,4 +46,16 @@ struct hns_roce_alloc_pd_resp {
 	__u32				reserved;
 };
 
+struct hns_roce_create_cq {
+	struct ibv_create_cq		ibv_cmd;
+	__u64				buf_addr;
+	__u64				db_addr;
+};
+
+struct hns_roce_create_cq_resp {
+	struct ibv_create_cq_resp	ibv_resp;
+	__u32				cqn;
+	__u32				reserved;
+};
+
 #endif /* _HNS_ROCE_U_ABI_H */
diff --git a/providers/hns/hns_roce_u_buf.c b/providers/hns/hns_roce_u_buf.c
new file mode 100644
index 0000000..f92ea65
--- /dev/null
+++ b/providers/hns/hns_roce_u_buf.c
@@ -0,0 +1,61 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <errno.h>
+#include <sys/mman.h>
+
+#include "hns_roce_u.h"
+
+int hns_roce_alloc_buf(struct hns_roce_buf *buf, unsigned int size,
+		       int page_size)
+{
+	int ret;
+
+	buf->length = align(size, page_size);
+	buf->buf = mmap(NULL, buf->length, PROT_READ | PROT_WRITE,
+			MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+	if (buf->buf == MAP_FAILED)
+		return errno;
+
+	ret = ibv_dontfork_range(buf->buf, size);
+	if (ret)
+		munmap(buf->buf, buf->length);
+
+	return ret;
+}
+
+void hns_roce_free_buf(struct hns_roce_buf *buf)
+{
+	ibv_dofork_range(buf->buf, buf->length);
+
+	munmap(buf->buf, buf->length);
+}
diff --git a/providers/hns/hns_roce_u_db.h b/providers/hns/hns_roce_u_db.h
new file mode 100644
index 0000000..76d13ce
--- /dev/null
+++ b/providers/hns/hns_roce_u_db.h
@@ -0,0 +1,54 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/types.h>
+
+#include "hns_roce_u.h"
+
+#ifndef _HNS_ROCE_U_DB_H
+#define _HNS_ROCE_U_DB_H
+
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+#define HNS_ROCE_PAIR_TO_64(val) ((uint64_t) val[1] << 32 | val[0])
+#elif __BYTE_ORDER == __BIG_ENDIAN
+#define HNS_ROCE_PAIR_TO_64(val) ((uint64_t) val[0] << 32 | val[1])
+#else
+#error __BYTE_ORDER not defined
+#endif
+
+static inline void hns_roce_write64(uint32_t val[2],
+				    struct hns_roce_context *ctx, int offset)
+{
+	*(volatile uint64_t *) (ctx->uar + offset) = HNS_ROCE_PAIR_TO_64(val);
+}
+
+#endif /* _HNS_ROCE_U_DB_H */
diff --git a/providers/hns/hns_roce_u_hw_v1.c b/providers/hns/hns_roce_u_hw_v1.c
new file mode 100644
index 0000000..2676021
--- /dev/null
+++ b/providers/hns/hns_roce_u_hw_v1.c
@@ -0,0 +1,370 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <malloc.h>
+#include "hns_roce_u_db.h"
+#include "hns_roce_u_hw_v1.h"
+#include "hns_roce_u.h"
+
+static void hns_roce_update_cq_cons_index(struct hns_roce_context *ctx,
+					  struct hns_roce_cq *cq)
+{
+	struct hns_roce_cq_db cq_db;
+
+	cq_db.u32_4 = 0;
+	cq_db.u32_8 = 0;
+
+	roce_set_bit(cq_db.u32_8, CQ_DB_U32_8_HW_SYNC_S, 1);
+	roce_set_field(cq_db.u32_8, CQ_DB_U32_8_CMD_M, CQ_DB_U32_8_CMD_S, 3);
+	roce_set_field(cq_db.u32_8, CQ_DB_U32_8_CMD_MDF_M,
+		       CQ_DB_U32_8_CMD_MDF_S, 0);
+	roce_set_field(cq_db.u32_8, CQ_DB_U32_8_CQN_M, CQ_DB_U32_8_CQN_S,
+		       cq->cqn);
+	roce_set_field(cq_db.u32_4, CQ_DB_U32_4_CONS_IDX_M,
+		       CQ_DB_U32_4_CONS_IDX_S,
+		       cq->cons_index & ((cq->cq_depth << 1) - 1));
+
+	hns_roce_write64((uint32_t *)&cq_db, ctx, ROCEE_DB_OTHERS_L_0_REG);
+}
+
+static void hns_roce_handle_error_cqe(struct hns_roce_cqe *cqe,
+				      struct ibv_wc *wc)
+{
+	switch (roce_get_field(cqe->cqe_byte_4,
+			       CQE_BYTE_4_STATUS_OF_THE_OPERATION_M,
+			       CQE_BYTE_4_STATUS_OF_THE_OPERATION_S) &
+		HNS_ROCE_CQE_STATUS_MASK) {
+		fprintf(stderr, PFX "error cqe!\n");
+	case HNS_ROCE_CQE_SYNDROME_LOCAL_LENGTH_ERR:
+		wc->status = IBV_WC_LOC_LEN_ERR;
+		break;
+	case HNS_ROCE_CQE_SYNDROME_LOCAL_QP_OP_ERR:
+		wc->status = IBV_WC_LOC_QP_OP_ERR;
+		break;
+	case HNS_ROCE_CQE_SYNDROME_LOCAL_PROT_ERR:
+		wc->status = IBV_WC_LOC_PROT_ERR;
+		break;
+	case HNS_ROCE_CQE_SYNDROME_WR_FLUSH_ERR:
+		wc->status = IBV_WC_WR_FLUSH_ERR;
+		break;
+	case HNS_ROCE_CQE_SYNDROME_MEM_MANAGE_OPERATE_ERR:
+		wc->status = IBV_WC_MW_BIND_ERR;
+		break;
+	case HNS_ROCE_CQE_SYNDROME_BAD_RESP_ERR:
+		wc->status = IBV_WC_BAD_RESP_ERR;
+		break;
+	case HNS_ROCE_CQE_SYNDROME_LOCAL_ACCESS_ERR:
+		wc->status = IBV_WC_LOC_ACCESS_ERR;
+		break;
+	case HNS_ROCE_CQE_SYNDROME_REMOTE_INVAL_REQ_ERR:
+		wc->status = IBV_WC_REM_INV_REQ_ERR;
+		break;
+	case HNS_ROCE_CQE_SYNDROME_REMOTE_ACCESS_ERR:
+		wc->status = IBV_WC_REM_ACCESS_ERR;
+		break;
+	case HNS_ROCE_CQE_SYNDROME_REMOTE_OP_ERR:
+		wc->status = IBV_WC_REM_OP_ERR;
+		break;
+	case HNS_ROCE_CQE_SYNDROME_TRANSPORT_RETRY_EXC_ERR:
+		wc->status = IBV_WC_RETRY_EXC_ERR;
+		break;
+	case HNS_ROCE_CQE_SYNDROME_RNR_RETRY_EXC_ERR:
+		wc->status = IBV_WC_RNR_RETRY_EXC_ERR;
+		break;
+	default:
+		wc->status = IBV_WC_GENERAL_ERR;
+		break;
+	}
+}
+
+static struct hns_roce_cqe *get_cqe(struct hns_roce_cq *cq, int entry)
+{
+	return cq->buf.buf + entry * HNS_ROCE_CQE_ENTRY_SIZE;
+}
+
+static void *get_sw_cqe(struct hns_roce_cq *cq, int n)
+{
+	struct hns_roce_cqe *cqe = get_cqe(cq, n & cq->ibv_cq.cqe);
+
+	return (!!(roce_get_bit(cqe->cqe_byte_4, CQE_BYTE_4_OWNER_S)) ^
+		!!(n & (cq->ibv_cq.cqe + 1))) ? cqe : NULL;
+}
+
+static struct hns_roce_cqe *next_cqe_sw(struct hns_roce_cq *cq)
+{
+	return get_sw_cqe(cq, cq->cons_index);
+}
+
+static void *get_send_wqe(struct hns_roce_qp *qp, int n)
+{
+	if ((n < 0) || (n > qp->sq.wqe_cnt)) {
+		printf("sq wqe index:%d,sq wqe cnt:%d\r\n", n, qp->sq.wqe_cnt);
+		return NULL;
+	}
+
+	return (void *)((uint64_t)(qp->buf.buf) + qp->sq.offset +
+				  (n << qp->sq.wqe_shift));
+}
+
+static struct hns_roce_qp *hns_roce_find_qp(struct hns_roce_context *ctx,
+					    uint32_t qpn)
+{
+	int tind = (qpn & (ctx->num_qps - 1)) >> ctx->qp_table_shift;
+
+	if (ctx->qp_table[tind].refcnt) {
+		return ctx->qp_table[tind].table[qpn & ctx->qp_table_mask];
+	} else {
+		printf("hns_roce_find_qp fail!\n");
+		return NULL;
+	}
+}
+
+static int hns_roce_v1_poll_one(struct hns_roce_cq *cq,
+				struct hns_roce_qp **cur_qp, struct ibv_wc *wc)
+{
+	uint32_t qpn;
+	int is_send;
+	uint16_t wqe_ctr;
+	uint32_t local_qpn;
+	struct hns_roce_wq *wq = NULL;
+	struct hns_roce_cqe *cqe = NULL;
+	struct hns_roce_wqe_ctrl_seg *sq_wqe = NULL;
+
+	/* According to CI, find the relative cqe */
+	cqe = next_cqe_sw(cq);
+	if (!cqe)
+		return CQ_EMPTY;
+
+	/* Get the next cqe, CI will be added gradually */
+	++cq->cons_index;
+
+	rmb();
+
+	qpn = roce_get_field(cqe->cqe_byte_16, CQE_BYTE_16_LOCAL_QPN_M,
+			     CQE_BYTE_16_LOCAL_QPN_S);
+
+	is_send = (roce_get_bit(cqe->cqe_byte_4, CQE_BYTE_4_SQ_RQ_FLAG_S) ==
+		   HNS_ROCE_CQE_IS_SQ);
+
+	local_qpn = roce_get_field(cqe->cqe_byte_16, CQE_BYTE_16_LOCAL_QPN_M,
+				   CQE_BYTE_16_LOCAL_QPN_S);
+
+	/* if qp is zero, it will not get the correct qpn */
+	if (!*cur_qp ||
+	    (local_qpn & HNS_ROCE_CQE_QPN_MASK) != (*cur_qp)->ibv_qp.qp_num) {
+
+		*cur_qp = hns_roce_find_qp(to_hr_ctx(cq->ibv_cq.context),
+					   qpn & 0xffffff);
+		if (!*cur_qp) {
+			fprintf(stderr, PFX "can't find qp!\n");
+			return CQ_POLL_ERR;
+		}
+	}
+	wc->qp_num = qpn & 0xffffff;
+
+	if (is_send) {
+		wq = &(*cur_qp)->sq;
+		/*
+		 * if sq_signal_bits is 1, the tail pointer first update to
+		 * the wqe corresponding the current cqe
+		 */
+		if ((*cur_qp)->sq_signal_bits) {
+			wqe_ctr = (uint16_t)(roce_get_field(cqe->cqe_byte_4,
+						CQE_BYTE_4_WQE_INDEX_M,
+						CQE_BYTE_4_WQE_INDEX_S));
+			/*
+			 * wq->tail will plus a positive number every time,
+			 * when wq->tail exceeds 32b, it is 0 and acc
+			 */
+			wq->tail += (wqe_ctr - (uint16_t) wq->tail) &
+				    (wq->wqe_cnt - 1);
+		}
+		/* write the wr_id of wq into the wc */
+		wc->wr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)];
+		++wq->tail;
+	} else {
+		wq = &(*cur_qp)->rq;
+		wc->wr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)];
+		++wq->tail;
+	}
+
+	/*
+	 * HW maintains wc status, set the err type and directly return, after
+	 * generated the incorrect CQE
+	 */
+	if (roce_get_field(cqe->cqe_byte_4,
+	    CQE_BYTE_4_STATUS_OF_THE_OPERATION_M,
+	    CQE_BYTE_4_STATUS_OF_THE_OPERATION_S) != HNS_ROCE_CQE_SUCCESS) {
+		hns_roce_handle_error_cqe(cqe, wc);
+		return CQ_OK;
+	}
+	wc->status = IBV_WC_SUCCESS;
+
+	/*
+	 * According to the opcode type of cqe, mark the opcode and other
+	 * information of wc
+	 */
+	if (is_send) {
+		/* Get opcode and flag before update the tail point for send */
+		sq_wqe = (struct hns_roce_wqe_ctrl_seg *)
+			 (uint64_t)get_send_wqe(*cur_qp,
+						roce_get_field(cqe->cqe_byte_4,
+						CQE_BYTE_4_WQE_INDEX_M,
+						CQE_BYTE_4_WQE_INDEX_S));
+		switch (sq_wqe->flag & HNS_ROCE_WQE_OPCODE_MASK) {
+		case HNS_ROCE_WQE_OPCODE_SEND:
+			wc->opcode = IBV_WC_SEND;
+			break;
+		case HNS_ROCE_WQE_OPCODE_RDMA_READ:
+			wc->opcode = IBV_WC_RDMA_READ;
+			wc->byte_len = cqe->byte_cnt;
+			break;
+		case HNS_ROCE_WQE_OPCODE_RDMA_WRITE:
+			wc->opcode = IBV_WC_RDMA_WRITE;
+			break;
+		case HNS_ROCE_WQE_OPCODE_BIND_MW2:
+			wc->opcode = IBV_WC_BIND_MW;
+			break;
+		default:
+			wc->status = IBV_WC_GENERAL_ERR;
+			break;
+		}
+		wc->wc_flags = (sq_wqe->flag & HNS_ROCE_WQE_IMM ?
+				IBV_WC_WITH_IMM : 0);
+	} else {
+		/* Get opcode and flag in rq&srq */
+		wc->byte_len = (cqe->byte_cnt);
+
+		switch (roce_get_field(cqe->cqe_byte_4,
+				       CQE_BYTE_4_OPERATION_TYPE_M,
+				       CQE_BYTE_4_OPERATION_TYPE_S) &
+			HNS_ROCE_CQE_OPCODE_MASK) {
+		case HNS_ROCE_OPCODE_RDMA_WITH_IMM_RECEIVE:
+			wc->opcode   = IBV_WC_RECV_RDMA_WITH_IMM;
+			wc->wc_flags = IBV_WC_WITH_IMM;
+			wc->imm_data = cqe->immediate_data;
+			break;
+		case HNS_ROCE_OPCODE_SEND_DATA_RECEIVE:
+			if (roce_get_bit(cqe->cqe_byte_4,
+					 CQE_BYTE_4_IMMEDIATE_DATA_FLAG_S)) {
+				wc->opcode   = IBV_WC_RECV;
+				wc->wc_flags = IBV_WC_WITH_IMM;
+				wc->imm_data = cqe->immediate_data;
+			} else {
+				wc->opcode   = IBV_WC_RECV;
+				wc->wc_flags = 0;
+			}
+			break;
+		default:
+			wc->status = IBV_WC_GENERAL_ERR;
+			break;
+		}
+	}
+
+	return CQ_OK;
+}
+
+static int hns_roce_u_v1_poll_cq(struct ibv_cq *ibvcq, int ne,
+				 struct ibv_wc *wc)
+{
+	int npolled;
+	int err = CQ_OK;
+	struct hns_roce_qp *qp = NULL;
+	struct hns_roce_cq *cq = to_hr_cq(ibvcq);
+	struct hns_roce_context *ctx = to_hr_ctx(ibvcq->context);
+	struct hns_roce_device *dev = to_hr_dev(ibvcq->context->device);
+
+	pthread_spin_lock(&cq->lock);
+
+	for (npolled = 0; npolled < ne; ++npolled) {
+		err = hns_roce_v1_poll_one(cq, &qp, wc + npolled);
+		if (err != CQ_OK)
+			break;
+	}
+
+	if (npolled) {
+		if (dev->hw_version == HNS_ROCE_HW_VER1) {
+			*cq->set_ci_db = (unsigned short)(cq->cons_index &
+					 ((cq->cq_depth << 1) - 1));
+			mb();
+		}
+
+		hns_roce_update_cq_cons_index(ctx, cq);
+	}
+
+	pthread_spin_unlock(&cq->lock);
+
+	return err == CQ_POLL_ERR ? err : npolled;
+}
+
+/**
+ * hns_roce_u_v1_arm_cq - request completion notification on a CQ
+ * @ibvcq: The completion queue to request notification for.
+ * @solicited: If non-zero, a event will be generated only for
+ *	      the next solicited CQ entry. If zero, any CQ entry,
+ *	      solicited or not, will generate an event
+ */
+static int hns_roce_u_v1_arm_cq(struct ibv_cq *ibvcq, int solicited)
+{
+	uint32_t ci;
+	uint32_t solicited_flag;
+	struct hns_roce_cq_db cq_db;
+	struct hns_roce_cq *cq = to_hr_cq(ibvcq);
+
+	ci  = cq->cons_index & ((cq->cq_depth << 1) - 1);
+	solicited_flag = solicited ? HNS_ROCE_CQ_DB_REQ_SOL :
+				     HNS_ROCE_CQ_DB_REQ_NEXT;
+
+	cq_db.u32_4 = 0;
+	cq_db.u32_8 = 0;
+
+	roce_set_bit(cq_db.u32_8, CQ_DB_U32_8_HW_SYNC_S, 1);
+	roce_set_field(cq_db.u32_8, CQ_DB_U32_8_CMD_M, CQ_DB_U32_8_CMD_S, 3);
+	roce_set_field(cq_db.u32_8, CQ_DB_U32_8_CMD_MDF_M,
+		       CQ_DB_U32_8_CMD_MDF_S, 1);
+	roce_set_bit(cq_db.u32_8, CQ_DB_U32_8_NOTIFY_TYPE_S, solicited_flag);
+	roce_set_field(cq_db.u32_8, CQ_DB_U32_8_CQN_M, CQ_DB_U32_8_CQN_S,
+		       cq->cqn);
+	roce_set_field(cq_db.u32_4, CQ_DB_U32_4_CONS_IDX_M,
+		       CQ_DB_U32_4_CONS_IDX_S, ci);
+
+	hns_roce_write64((uint32_t *)&cq_db, to_hr_ctx(ibvcq->context),
+			  ROCEE_DB_OTHERS_L_0_REG);
+	return 0;
+}
+
+struct hns_roce_u_hw hns_roce_u_hw_v1 = {
+	.poll_cq = hns_roce_u_v1_poll_cq,
+	.arm_cq = hns_roce_u_v1_arm_cq,
+};
diff --git a/providers/hns/hns_roce_u_hw_v1.h b/providers/hns/hns_roce_u_hw_v1.h
new file mode 100644
index 0000000..b249f54
--- /dev/null
+++ b/providers/hns/hns_roce_u_hw_v1.h
@@ -0,0 +1,163 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _HNS_ROCE_U_HW_V1_H
+#define _HNS_ROCE_U_HW_V1_H
+
+#define HNS_ROCE_CQ_DB_REQ_SOL			1
+#define HNS_ROCE_CQ_DB_REQ_NEXT			0
+
+#define HNS_ROCE_CQE_IS_SQ			0
+
+#define HNS_ROCE_RC_WQE_INLINE_DATA_MAX_LEN	32
+
+enum {
+	HNS_ROCE_WQE_IMM		= 1 << 23,
+	HNS_ROCE_WQE_OPCODE_SEND        = 0 << 16,
+	HNS_ROCE_WQE_OPCODE_RDMA_READ   = 1 << 16,
+	HNS_ROCE_WQE_OPCODE_RDMA_WRITE  = 2 << 16,
+	HNS_ROCE_WQE_OPCODE_BIND_MW2    = 6 << 16,
+	HNS_ROCE_WQE_OPCODE_MASK        = 15 << 16,
+};
+
+struct hns_roce_wqe_ctrl_seg {
+	__be32		sgl_pa_h;
+	__be32		flag;
+};
+
+enum {
+	CQ_OK				=  0,
+	CQ_EMPTY			= -1,
+	CQ_POLL_ERR			= -2,
+};
+
+enum {
+	HNS_ROCE_CQE_QPN_MASK		= 0x3ffff,
+	HNS_ROCE_CQE_STATUS_MASK	= 0x1f,
+	HNS_ROCE_CQE_OPCODE_MASK	= 0xf,
+};
+
+enum {
+	HNS_ROCE_CQE_SUCCESS,
+	HNS_ROCE_CQE_SYNDROME_LOCAL_LENGTH_ERR,
+	HNS_ROCE_CQE_SYNDROME_LOCAL_QP_OP_ERR,
+	HNS_ROCE_CQE_SYNDROME_LOCAL_PROT_ERR,
+	HNS_ROCE_CQE_SYNDROME_WR_FLUSH_ERR,
+	HNS_ROCE_CQE_SYNDROME_MEM_MANAGE_OPERATE_ERR,
+	HNS_ROCE_CQE_SYNDROME_BAD_RESP_ERR,
+	HNS_ROCE_CQE_SYNDROME_LOCAL_ACCESS_ERR,
+	HNS_ROCE_CQE_SYNDROME_REMOTE_INVAL_REQ_ERR,
+	HNS_ROCE_CQE_SYNDROME_REMOTE_ACCESS_ERR,
+	HNS_ROCE_CQE_SYNDROME_REMOTE_OP_ERR,
+	HNS_ROCE_CQE_SYNDROME_TRANSPORT_RETRY_EXC_ERR,
+	HNS_ROCE_CQE_SYNDROME_RNR_RETRY_EXC_ERR,
+};
+
+struct hns_roce_cq_db {
+	unsigned int u32_4;
+	unsigned int u32_8;
+};
+#define CQ_DB_U32_4_CONS_IDX_S 0
+#define CQ_DB_U32_4_CONS_IDX_M   (((1UL << 16) - 1) << CQ_DB_U32_4_CONS_IDX_S)
+
+#define CQ_DB_U32_8_CQN_S 0
+#define CQ_DB_U32_8_CQN_M   (((1UL << 16) - 1) << CQ_DB_U32_8_CQN_S)
+
+#define CQ_DB_U32_8_NOTIFY_TYPE_S 16
+
+#define CQ_DB_U32_8_CMD_MDF_S 24
+#define CQ_DB_U32_8_CMD_MDF_M   (((1UL << 4) - 1) << CQ_DB_U32_8_CMD_MDF_S)
+
+#define CQ_DB_U32_8_CMD_S 28
+#define CQ_DB_U32_8_CMD_M   (((1UL << 3) - 1) << CQ_DB_U32_8_CMD_S)
+
+#define CQ_DB_U32_8_HW_SYNC_S 31
+
+struct hns_roce_cqe {
+	unsigned int cqe_byte_4;
+	union {
+		unsigned int r_key;
+		unsigned int immediate_data;
+	};
+	unsigned int byte_cnt;
+	unsigned int cqe_byte_16;
+	unsigned int cqe_byte_20;
+	unsigned int s_mac_l;
+	unsigned int cqe_byte_28;
+	unsigned int reserved;
+};
+#define CQE_BYTE_4_OPERATION_TYPE_S 0
+#define CQE_BYTE_4_OPERATION_TYPE_M   \
+	(((1UL << 4) - 1) << CQE_BYTE_4_OPERATION_TYPE_S)
+
+#define CQE_BYTE_4_OWNER_S 7
+
+#define CQE_BYTE_4_STATUS_OF_THE_OPERATION_S 8
+#define CQE_BYTE_4_STATUS_OF_THE_OPERATION_M   \
+	(((1UL << 5) - 1) << CQE_BYTE_4_STATUS_OF_THE_OPERATION_S)
+
+#define CQE_BYTE_4_SQ_RQ_FLAG_S 14
+
+#define CQE_BYTE_4_IMMEDIATE_DATA_FLAG_S 15
+
+#define CQE_BYTE_4_WQE_INDEX_S 16
+#define CQE_BYTE_4_WQE_INDEX_M	(((1UL << 14) - 1) << CQE_BYTE_4_WQE_INDEX_S)
+
+#define CQE_BYTE_16_LOCAL_QPN_S 0
+#define CQE_BYTE_16_LOCAL_QPN_M	(((1UL << 24) - 1) << CQE_BYTE_16_LOCAL_QPN_S)
+
+#define ROCEE_DB_SQ_L_0_REG				0x230
+
+#define ROCEE_DB_OTHERS_L_0_REG				0x238
+
+struct hns_roce_rc_send_wqe {
+	unsigned int sgl_ba_31_0;
+	unsigned int u32_1;
+	union {
+		unsigned int r_key;
+		unsigned int immediate_data;
+	};
+	unsigned int msg_length;
+	unsigned int rvd_3;
+	unsigned int rvd_4;
+	unsigned int rvd_5;
+	unsigned int rvd_6;
+	uint64_t     va0;
+	unsigned int l_key0;
+	unsigned int length0;
+
+	uint64_t     va1;
+	unsigned int l_key1;
+	unsigned int length1;
+};
+
+#endif /* _HNS_ROCE_U_HW_V1_H */
diff --git a/providers/hns/hns_roce_u_verbs.c b/providers/hns/hns_roce_u_verbs.c
index 249d1aa..077cddc 100644
--- a/providers/hns/hns_roce_u_verbs.c
+++ b/providers/hns/hns_roce_u_verbs.c
@@ -40,6 +40,8 @@
 #include <unistd.h>
 
 #include "hns_roce_u.h"
+#include "hns_roce_u_abi.h"
+#include "hns_roce_u_hw_v1.h"
 
 int hns_roce_u_query_device(struct ibv_context *context,
 			    struct ibv_device_attr *attr)
@@ -150,3 +152,117 @@ int hns_roce_u_dereg_mr(struct ibv_mr *mr)
 
 	return ret;
 }
+
+static int align_cq_size(int req)
+{
+	int nent;
+
+	for (nent = HNS_ROCE_MIN_CQE_NUM; nent < req; nent <<= 1)
+		;
+
+	return nent;
+}
+
+static int hns_roce_verify_cq(int *cqe, struct hns_roce_context *context)
+{
+	if (*cqe < HNS_ROCE_MIN_CQE_NUM) {
+		fprintf(stderr, "cqe = %d, less than minimum CQE number.\n",
+			*cqe);
+		*cqe = HNS_ROCE_MIN_CQE_NUM;
+	}
+
+	if (*cqe > context->max_cqe)
+		return -1;
+
+	return 0;
+}
+
+static int hns_roce_alloc_cq_buf(struct hns_roce_device *dev,
+				 struct hns_roce_buf *buf, int nent)
+{
+	if (hns_roce_alloc_buf(buf,
+			align(nent * HNS_ROCE_CQE_ENTRY_SIZE, dev->page_size),
+			dev->page_size))
+		return -1;
+	memset(buf->buf, 0, nent * HNS_ROCE_CQE_ENTRY_SIZE);
+
+	return 0;
+}
+
+struct ibv_cq *hns_roce_u_create_cq(struct ibv_context *context, int cqe,
+				    struct ibv_comp_channel *channel,
+				    int comp_vector)
+{
+	struct hns_roce_create_cq	cmd;
+	struct hns_roce_create_cq_resp	resp;
+	struct hns_roce_cq		*cq;
+	int				ret;
+
+	if (hns_roce_verify_cq(&cqe, to_hr_ctx(context)))
+		return NULL;
+
+	cq = malloc(sizeof(*cq));
+	if (!cq)
+		return NULL;
+
+	cq->cons_index = 0;
+
+	if (pthread_spin_init(&cq->lock, PTHREAD_PROCESS_PRIVATE))
+		goto err;
+
+	cqe = align_cq_size(cqe);
+
+	if (hns_roce_alloc_cq_buf(to_hr_dev(context->device), &cq->buf, cqe))
+		goto err;
+
+	cmd.buf_addr = (uintptr_t) cq->buf.buf;
+
+	ret = ibv_cmd_create_cq(context, cqe, channel, comp_vector,
+				&cq->ibv_cq, &cmd.ibv_cmd, sizeof(cmd),
+				&resp.ibv_resp, sizeof(resp));
+	if (ret)
+		goto err_db;
+
+	cq->cqn = resp.cqn;
+	cq->cq_depth = cqe;
+
+	if (to_hr_dev(context->device)->hw_version == HNS_ROCE_HW_VER1)
+		cq->set_ci_db = to_hr_ctx(context)->cq_tptr_base + cq->cqn * 2;
+	else
+		cq->set_ci_db = to_hr_ctx(context)->uar +
+				ROCEE_DB_OTHERS_L_0_REG;
+
+	cq->arm_db    = cq->set_ci_db;
+	cq->arm_sn    = 1;
+	*(cq->set_ci_db) = 0;
+	*(cq->arm_db) = 0;
+
+	return &cq->ibv_cq;
+
+err_db:
+	hns_roce_free_buf(&cq->buf);
+
+err:
+	free(cq);
+
+	return NULL;
+}
+
+void hns_roce_u_cq_event(struct ibv_cq *cq)
+{
+	to_hr_cq(cq)->arm_sn++;
+}
+
+int hns_roce_u_destroy_cq(struct ibv_cq *cq)
+{
+	int ret;
+
+	ret = ibv_cmd_destroy_cq(cq);
+	if (ret)
+		return ret;
+
+	hns_roce_free_buf(&to_hr_cq(cq)->buf);
+	free(to_hr_cq(cq));
+
+	return ret;
+}
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 rdma-core 3/7] libhns: Add verbs of pd and mr support
From: Lijun Ou @ 2016-10-29  9:03 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1477731826-10787-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

This patch mainly introduces the verbs with pd and mr,
included alloc_pd, dealloc_pd, reg_mr and dereg_mr.

Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
v2:
- No change over v1

v1:
- The initial submit
---
 providers/hns/hns_roce_u.c       |  4 ++
 providers/hns/hns_roce_u.h       | 18 +++++++++
 providers/hns/hns_roce_u_abi.h   |  6 +++
 providers/hns/hns_roce_u_verbs.c | 79 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 107 insertions(+)

diff --git a/providers/hns/hns_roce_u.c b/providers/hns/hns_roce_u.c
index c0f6fe9..53e2720 100644
--- a/providers/hns/hns_roce_u.c
+++ b/providers/hns/hns_roce_u.c
@@ -97,6 +97,10 @@ static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
 
 	context->ibv_ctx.ops.query_device  = hns_roce_u_query_device;
 	context->ibv_ctx.ops.query_port    = hns_roce_u_query_port;
+	context->ibv_ctx.ops.alloc_pd	   = hns_roce_u_alloc_pd;
+	context->ibv_ctx.ops.dealloc_pd    = hns_roce_u_free_pd;
+	context->ibv_ctx.ops.reg_mr	   = hns_roce_u_reg_mr;
+	context->ibv_ctx.ops.dereg_mr	   = hns_roce_u_dereg_mr;
 
 	if (hns_roce_u_query_device(&context->ibv_ctx, &dev_attrs))
 		goto tptr_free;
diff --git a/providers/hns/hns_roce_u.h b/providers/hns/hns_roce_u.h
index aa58ee6..5b73794 100644
--- a/providers/hns/hns_roce_u.h
+++ b/providers/hns/hns_roce_u.h
@@ -73,6 +73,11 @@ struct hns_roce_context {
 	int				max_cqe;
 };
 
+struct hns_roce_pd {
+	struct ibv_pd			ibv_pd;
+	unsigned int			pdn;
+};
+
 static inline struct hns_roce_device *to_hr_dev(struct ibv_device *ibv_dev)
 {
 	return container_of(ibv_dev, struct hns_roce_device, ibv_dev);
@@ -83,8 +88,21 @@ static inline struct hns_roce_context *to_hr_ctx(struct ibv_context *ibv_ctx)
 	return container_of(ibv_ctx, struct hns_roce_context, ibv_ctx);
 }
 
+static inline struct hns_roce_pd *to_hr_pd(struct ibv_pd *ibv_pd)
+{
+	return container_of(ibv_pd, struct hns_roce_pd, ibv_pd);
+}
+
 int hns_roce_u_query_device(struct ibv_context *context,
 			    struct ibv_device_attr *attr);
 int hns_roce_u_query_port(struct ibv_context *context, uint8_t port,
 			  struct ibv_port_attr *attr);
+
+struct ibv_pd *hns_roce_u_alloc_pd(struct ibv_context *context);
+int hns_roce_u_free_pd(struct ibv_pd *pd);
+
+struct ibv_mr *hns_roce_u_reg_mr(struct ibv_pd *pd, void *addr, size_t length,
+				 int access);
+int hns_roce_u_dereg_mr(struct ibv_mr *mr);
+
 #endif /* _HNS_ROCE_U_H */
diff --git a/providers/hns/hns_roce_u_abi.h b/providers/hns/hns_roce_u_abi.h
index 4bfc8fa..0a0cd0c 100644
--- a/providers/hns/hns_roce_u_abi.h
+++ b/providers/hns/hns_roce_u_abi.h
@@ -40,4 +40,10 @@ struct hns_roce_alloc_ucontext_resp {
 	__u32				qp_tab_size;
 };
 
+struct hns_roce_alloc_pd_resp {
+	struct ibv_alloc_pd_resp	ibv_resp;
+	__u32				pdn;
+	__u32				reserved;
+};
+
 #endif /* _HNS_ROCE_U_ABI_H */
diff --git a/providers/hns/hns_roce_u_verbs.c b/providers/hns/hns_roce_u_verbs.c
index be55fe8..249d1aa 100644
--- a/providers/hns/hns_roce_u_verbs.c
+++ b/providers/hns/hns_roce_u_verbs.c
@@ -71,3 +71,82 @@ int hns_roce_u_query_port(struct ibv_context *context, uint8_t port,
 
 	return ibv_cmd_query_port(context, port, attr, &cmd, sizeof(cmd));
 }
+
+struct ibv_pd *hns_roce_u_alloc_pd(struct ibv_context *context)
+{
+	struct ibv_alloc_pd cmd;
+	struct hns_roce_pd *pd;
+	struct hns_roce_alloc_pd_resp resp;
+
+	pd = (struct hns_roce_pd *)malloc(sizeof(*pd));
+	if (!pd)
+		return NULL;
+
+	if (ibv_cmd_alloc_pd(context, &pd->ibv_pd, &cmd, sizeof(cmd),
+			     &resp.ibv_resp, sizeof(resp))) {
+		free(pd);
+		return NULL;
+	}
+
+	pd->pdn = resp.pdn;
+
+	return &pd->ibv_pd;
+}
+
+int hns_roce_u_free_pd(struct ibv_pd *pd)
+{
+	int ret;
+
+	ret = ibv_cmd_dealloc_pd(pd);
+	if (ret)
+		return ret;
+
+	free(to_hr_pd(pd));
+
+	return ret;
+}
+
+struct ibv_mr *hns_roce_u_reg_mr(struct ibv_pd *pd, void *addr, size_t length,
+				 int access)
+{
+	int ret;
+	struct ibv_mr *mr;
+	struct ibv_reg_mr cmd;
+	struct ibv_reg_mr_resp resp;
+
+	if (addr == NULL) {
+		fprintf(stderr, "2nd parm addr is NULL!\n");
+		return NULL;
+	}
+
+	if (length == 0) {
+		fprintf(stderr, "3st parm length is 0!\n");
+		return NULL;
+	}
+
+	mr = malloc(sizeof(*mr));
+	if (mr)
+		return NULL;
+
+	ret = ibv_cmd_reg_mr(pd, addr, length, (uintptr_t) addr, access, mr,
+			     &cmd, sizeof(cmd), &resp, sizeof(resp));
+	if (ret) {
+		free(mr);
+		return NULL;
+	}
+
+	return mr;
+}
+
+int hns_roce_u_dereg_mr(struct ibv_mr *mr)
+{
+	int ret;
+
+	ret = ibv_cmd_dereg_mr(mr);
+	if (ret)
+		return ret;
+
+	free(mr);
+
+	return ret;
+}
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 rdma-core 2/7] libhns: Add verbs of querying device and querying port
From: Lijun Ou @ 2016-10-29  9:03 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1477731826-10787-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

This patch mainly introduces query verbs for querying device
and querying port.

Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
v2:
- No change over the v1

v1:
- The initial submit
---
 providers/hns/hns_roce_u.c       |  7 ++++
 providers/hns/hns_roce_u.h       |  4 +++
 providers/hns/hns_roce_u_verbs.c | 73 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 84 insertions(+)
 create mode 100644 providers/hns/hns_roce_u_verbs.c

diff --git a/providers/hns/hns_roce_u.c b/providers/hns/hns_roce_u.c
index bda4dd8..c0f6fe9 100644
--- a/providers/hns/hns_roce_u.c
+++ b/providers/hns/hns_roce_u.c
@@ -95,12 +95,19 @@ static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
 
 	pthread_spin_init(&context->uar_lock, PTHREAD_PROCESS_PRIVATE);
 
+	context->ibv_ctx.ops.query_device  = hns_roce_u_query_device;
+	context->ibv_ctx.ops.query_port    = hns_roce_u_query_port;
+
+	if (hns_roce_u_query_device(&context->ibv_ctx, &dev_attrs))
+		goto tptr_free;
+
 	context->max_qp_wr = dev_attrs.max_qp_wr;
 	context->max_sge = dev_attrs.max_sge;
 	context->max_cqe = dev_attrs.max_cqe;
 
 	return &context->ibv_ctx;
 
+tptr_free:
 err_free:
 	free(context);
 	return NULL;
diff --git a/providers/hns/hns_roce_u.h b/providers/hns/hns_roce_u.h
index 3eef171..aa58ee6 100644
--- a/providers/hns/hns_roce_u.h
+++ b/providers/hns/hns_roce_u.h
@@ -83,4 +83,8 @@ static inline struct hns_roce_context *to_hr_ctx(struct ibv_context *ibv_ctx)
 	return container_of(ibv_ctx, struct hns_roce_context, ibv_ctx);
 }
 
+int hns_roce_u_query_device(struct ibv_context *context,
+			    struct ibv_device_attr *attr);
+int hns_roce_u_query_port(struct ibv_context *context, uint8_t port,
+			  struct ibv_port_attr *attr);
 #endif /* _HNS_ROCE_U_H */
diff --git a/providers/hns/hns_roce_u_verbs.c b/providers/hns/hns_roce_u_verbs.c
new file mode 100644
index 0000000..be55fe8
--- /dev/null
+++ b/providers/hns/hns_roce_u_verbs.c
@@ -0,0 +1,73 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+#include <pthread.h>
+#include <sys/mman.h>
+#include <fcntl.h>
+#include <unistd.h>
+
+#include "hns_roce_u.h"
+
+int hns_roce_u_query_device(struct ibv_context *context,
+			    struct ibv_device_attr *attr)
+{
+	int ret;
+	struct ibv_query_device cmd;
+	unsigned long raw_fw_ver;
+	unsigned int major, minor, sub_minor;
+
+	ret = ibv_cmd_query_device(context, attr, &raw_fw_ver, &cmd,
+				   sizeof(cmd));
+	if (ret)
+		return ret;
+
+	major	   = (raw_fw_ver >> 32) & 0xffff;
+	minor	   = (raw_fw_ver >> 16) & 0xffff;
+	sub_minor = raw_fw_ver & 0xffff;
+
+	snprintf(attr->fw_ver, sizeof(attr->fw_ver), "%d.%d.%03d", major, minor,
+		 sub_minor);
+
+	return 0;
+}
+
+int hns_roce_u_query_port(struct ibv_context *context, uint8_t port,
+			  struct ibv_port_attr *attr)
+{
+	struct ibv_query_port cmd;
+
+	return ibv_cmd_query_port(context, port, attr, &cmd, sizeof(cmd));
+}
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 rdma-core 1/7] libhns: Add initial main frame
From: Lijun Ou @ 2016-10-29  9:03 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1477731826-10787-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

This patch mainly introduces initial main frame for
userspace library of hns.

Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
v2:
- No change over the v1

v1:
- The initial submit
---
 providers/hns/hns_roce_u.c     | 163 +++++++++++++++++++++++++++++++++++++++++
 providers/hns/hns_roce_u.h     |  86 ++++++++++++++++++++++
 providers/hns/hns_roce_u_abi.h |  43 +++++++++++
 3 files changed, 292 insertions(+)
 create mode 100644 providers/hns/hns_roce_u.c
 create mode 100644 providers/hns/hns_roce_u.h
 create mode 100644 providers/hns/hns_roce_u_abi.h

diff --git a/providers/hns/hns_roce_u.c b/providers/hns/hns_roce_u.c
new file mode 100644
index 0000000..bda4dd8
--- /dev/null
+++ b/providers/hns/hns_roce_u.c
@@ -0,0 +1,163 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <pthread.h>
+#include <sys/mman.h>
+#include <fcntl.h>
+#include <unistd.h>
+
+#include "hns_roce_u.h"
+#include "hns_roce_u_abi.h"
+
+#define HID_LEN			15
+#define DEV_MATCH_LEN		128
+
+static const struct {
+	char	 hid[HID_LEN];
+} acpi_table[] = {
+	{"acpi:HISI00D1:"},
+	{},
+};
+
+static const struct {
+	char	 compatible[DEV_MATCH_LEN];
+} dt_table[] = {
+	{"hisilicon,hns-roce-v1"},
+	{},
+};
+
+static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
+						  int cmd_fd)
+{
+	int i;
+	struct ibv_get_context cmd;
+	struct ibv_device_attr dev_attrs;
+	struct hns_roce_context *context;
+	struct hns_roce_alloc_ucontext_resp resp;
+	struct hns_roce_device *hr_dev = to_hr_dev(ibdev);
+
+	context = calloc(1, sizeof(*context));
+	if (!context)
+		return NULL;
+
+	context->ibv_ctx.cmd_fd = cmd_fd;
+	if (ibv_cmd_get_context(&context->ibv_ctx, &cmd, sizeof(cmd),
+				&resp.ibv_resp, sizeof(resp)))
+		goto err_free;
+
+	context->num_qps = resp.qp_tab_size;
+	context->qp_table_shift = ffs(context->num_qps) - 1 -
+				  HNS_ROCE_QP_TABLE_BITS;
+	context->qp_table_mask = (1 << context->qp_table_shift) - 1;
+
+	pthread_mutex_init(&context->qp_table_mutex, NULL);
+	for (i = 0; i < HNS_ROCE_QP_TABLE_SIZE; ++i)
+		context->qp_table[i].refcnt = 0;
+
+	context->uar = mmap(NULL, to_hr_dev(ibdev)->page_size,
+			    PROT_READ | PROT_WRITE, MAP_SHARED, cmd_fd, 0);
+	if (context->uar == MAP_FAILED) {
+		fprintf(stderr, PFX "Warning: failed to mmap() uar page.\n");
+		goto err_free;
+	}
+
+	pthread_spin_init(&context->uar_lock, PTHREAD_PROCESS_PRIVATE);
+
+	context->max_qp_wr = dev_attrs.max_qp_wr;
+	context->max_sge = dev_attrs.max_sge;
+	context->max_cqe = dev_attrs.max_cqe;
+
+	return &context->ibv_ctx;
+
+err_free:
+	free(context);
+	return NULL;
+}
+
+static void hns_roce_free_context(struct ibv_context *ibctx)
+{
+	struct hns_roce_context *context = to_hr_ctx(ibctx);
+
+	munmap(context->uar, to_hr_dev(ibctx->device)->page_size);
+
+	context->uar = NULL;
+
+	free(context);
+	context = NULL;
+}
+
+static struct ibv_device_ops hns_roce_dev_ops = {
+	.alloc_context = hns_roce_alloc_context,
+	.free_context	= hns_roce_free_context
+};
+
+static struct ibv_device *hns_roce_driver_init(const char *uverbs_sys_path,
+					       int abi_version)
+{
+	struct hns_roce_device  *dev;
+	char			 value[128];
+	int			 i;
+
+	if (ibv_read_sysfs_file(uverbs_sys_path, "device/modalias",
+				value, sizeof(value)) > 0)
+		for (i = 0; i < sizeof(acpi_table) / sizeof(acpi_table[0]); ++i)
+			if (!strcmp(value, acpi_table[i].hid))
+				goto found;
+
+	if (ibv_read_sysfs_file(uverbs_sys_path, "device/of_node/compatible",
+				value, sizeof(value)) > 0)
+		for (i = 0; i < sizeof(dt_table) / sizeof(dt_table[0]); ++i)
+			if (!strcmp(value, dt_table[i].compatible))
+				goto found;
+
+	return NULL;
+
+found:
+	dev = malloc(sizeof(struct hns_roce_device));
+	if (!dev) {
+		fprintf(stderr, PFX "Fatal: couldn't allocate device for %s\n",
+			uverbs_sys_path);
+		return NULL;
+	}
+
+	dev->ibv_dev.ops = hns_roce_dev_ops;
+	dev->page_size   = sysconf(_SC_PAGESIZE);
+	return &dev->ibv_dev;
+}
+
+static __attribute__((constructor)) void hns_roce_register_driver(void)
+{
+	ibv_register_driver("hns", hns_roce_driver_init);
+}
diff --git a/providers/hns/hns_roce_u.h b/providers/hns/hns_roce_u.h
new file mode 100644
index 0000000..3eef171
--- /dev/null
+++ b/providers/hns/hns_roce_u.h
@@ -0,0 +1,86 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _HNS_ROCE_U_H
+#define _HNS_ROCE_U_H
+
+#include <stddef.h>
+
+#include <infiniband/driver.h>
+#include <infiniband/arch.h>
+#include <infiniband/verbs.h>
+#include <ccan/container_of.h>
+
+#define HNS_ROCE_HW_VER1		('h' << 24 | 'i' << 16 | '0' << 8 | '6')
+
+#define PFX				"hns: "
+
+enum {
+	HNS_ROCE_QP_TABLE_BITS		= 8,
+	HNS_ROCE_QP_TABLE_SIZE		= 1 << HNS_ROCE_QP_TABLE_BITS,
+};
+
+struct hns_roce_device {
+	struct ibv_device		ibv_dev;
+	int				page_size;
+};
+
+struct hns_roce_context {
+	struct ibv_context		ibv_ctx;
+	void				*uar;
+	pthread_spinlock_t		uar_lock;
+
+	struct {
+		int			refcnt;
+	} qp_table[HNS_ROCE_QP_TABLE_SIZE];
+
+	pthread_mutex_t			qp_table_mutex;
+
+	int				num_qps;
+	int				qp_table_shift;
+	int				qp_table_mask;
+	unsigned int			max_qp_wr;
+	unsigned int			max_sge;
+	int				max_cqe;
+};
+
+static inline struct hns_roce_device *to_hr_dev(struct ibv_device *ibv_dev)
+{
+	return container_of(ibv_dev, struct hns_roce_device, ibv_dev);
+}
+
+static inline struct hns_roce_context *to_hr_ctx(struct ibv_context *ibv_ctx)
+{
+	return container_of(ibv_ctx, struct hns_roce_context, ibv_ctx);
+}
+
+#endif /* _HNS_ROCE_U_H */
diff --git a/providers/hns/hns_roce_u_abi.h b/providers/hns/hns_roce_u_abi.h
new file mode 100644
index 0000000..4bfc8fa
--- /dev/null
+++ b/providers/hns/hns_roce_u_abi.h
@@ -0,0 +1,43 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _HNS_ROCE_U_ABI_H
+#define _HNS_ROCE_U_ABI_H
+
+#include <infiniband/kern-abi.h>
+
+struct hns_roce_alloc_ucontext_resp {
+	struct ibv_get_context_resp	ibv_resp;
+	__u32				qp_tab_size;
+};
+
+#endif /* _HNS_ROCE_U_ABI_H */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 rdma-core 0/7] libhns: userspace library for hns
From: Lijun Ou @ 2016-10-29  9:03 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA

This patch series introduces userspace library for hns RoCE driver.

changes v1 -> v2:
1. Delete the min() definition and instead of ccan header
2. Delete the CHECK_C_SOURCE_COMPILES
3. sort the c file in rdma_provider()
4. Delete the unused code in hns_roce_u_db.h

Lijun Ou (7):
  libhns: Add initial main frame
  libhns: Add verbs of querying device and querying port
  libhns: Add verbs of pd and mr support
  libhns: Add verbs of cq support
  libhns: Add verbs of qp support
  libhns: Add verbs of post_send and post_recv support
  libhns: Add consolidated repo for userspace library of hns

 CMakeLists.txt                   |   1 +
 MAINTAINERS                      |   6 +
 README.md                        |   1 +
 providers/hns/CMakeLists.txt     |   6 +
 providers/hns/hns_roce_u.c       | 228 +++++++++++
 providers/hns/hns_roce_u.h       | 255 ++++++++++++
 providers/hns/hns_roce_u_abi.h   |  69 ++++
 providers/hns/hns_roce_u_buf.c   |  61 +++
 providers/hns/hns_roce_u_db.h    |  54 +++
 providers/hns/hns_roce_u_hw_v1.c | 839 +++++++++++++++++++++++++++++++++++++++
 providers/hns/hns_roce_u_hw_v1.h | 242 +++++++++++
 providers/hns/hns_roce_u_verbs.c | 525 ++++++++++++++++++++++++
 12 files changed, 2287 insertions(+)
 create mode 100644 providers/hns/CMakeLists.txt
 create mode 100644 providers/hns/hns_roce_u.c
 create mode 100644 providers/hns/hns_roce_u.h
 create mode 100644 providers/hns/hns_roce_u_abi.h
 create mode 100644 providers/hns/hns_roce_u_buf.c
 create mode 100644 providers/hns/hns_roce_u_db.h
 create mode 100644 providers/hns/hns_roce_u_hw_v1.c
 create mode 100644 providers/hns/hns_roce_u_hw_v1.h
 create mode 100644 providers/hns/hns_roce_u_verbs.c

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-core 1/7] libhns: Add initial main frame
From: oulijun @ 2016-10-29  1:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <20161028164030.GA17289-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

在 2016/10/29 0:40, Jason Gunthorpe 写道:
> On Fri, Oct 28, 2016 at 03:59:47PM +0800, oulijun wrote:
> 
>> total 0
>> lrwxrwxrwx    1 root     root             0 Oct 27 11:07 driver -> ../../../../bus/platform/drivers/hns_roce
>>
>> but I think it is the standard approach. because my device(hip06) is
>> only platform device and the other device(hip07/hip0x0 will be pcie
>> device, it will be distinguished separately.  Hence, we adpot the
>> origin approach.
> 
> You have to parse out 'hns_roce' at the end of the readlink result and
> compare against that, drop the 'bus/platform/drivers' stuff
> 
> Your PCI and Platform device should both have the same driver name.
> 
> Jason
> 
> .
> 
Hi, Jason
   I could not express clearly. We hope that the only copy of libhns will be used for the diff
erent type chip(hip06, hip07, ...), and it can distinguish the hardware of these chip at the same time.
Hence, i think that your plan will not attain our demand.

We hope that the only one userspace library file named libhns-rdmav2.so will be used for the different hardware version(hip06, hip07, ...),
because there are only little change between their userspace drivers. So we need to distinguish hardware version.

We can't distinguish them if only matching driver name "hns_roce".

thanks
Lijun Ou

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v5 14/14] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
From: Bart Van Assche @ 2016-10-29  0:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3@sandisk.com>

Make nvme_requeue_req() check BLK_MQ_S_STOPPED instead of
QUEUE_FLAG_STOPPED. Remove the QUEUE_FLAG_STOPPED manipulations
that became superfluous because of this change. Change
blk_queue_stopped() tests into blk_mq_queue_stopped().

This patch fixes a race condition: using queue_flag_clear_unlocked()
is not safe if any other function that manipulates the queue flags
can be called concurrently, e.g. blk_cleanup_queue().

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/core.c | 16 ++--------------
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index fe15d94..45dd237 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -201,13 +201,7 @@ static struct nvme_ns *nvme_get_ns_from_disk(struct gendisk *disk)
 
 void nvme_requeue_req(struct request *req)
 {
-	unsigned long flags;
-
-	blk_mq_requeue_request(req, false);
-	spin_lock_irqsave(req->q->queue_lock, flags);
-	if (!blk_queue_stopped(req->q))
-		blk_mq_kick_requeue_list(req->q);
-	spin_unlock_irqrestore(req->q->queue_lock, flags);
+	blk_mq_requeue_request(req, !blk_mq_queue_stopped(req->q));
 }
 EXPORT_SYMBOL_GPL(nvme_requeue_req);
 
@@ -2078,13 +2072,8 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 	struct nvme_ns *ns;
 
 	mutex_lock(&ctrl->namespaces_mutex);
-	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		spin_lock_irq(ns->queue->queue_lock);
-		queue_flag_set(QUEUE_FLAG_STOPPED, ns->queue);
-		spin_unlock_irq(ns->queue->queue_lock);
-
+	list_for_each_entry(ns, &ctrl->namespaces, list)
 		blk_mq_quiesce_queue(ns->queue);
-	}
 	mutex_unlock(&ctrl->namespaces_mutex);
 }
 EXPORT_SYMBOL_GPL(nvme_stop_queues);
@@ -2095,7 +2084,6 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
 
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		queue_flag_clear_unlocked(QUEUE_FLAG_STOPPED, ns->queue);
 		blk_mq_start_stopped_hw_queues(ns->queue, true);
 		blk_mq_kick_requeue_list(ns->queue);
 	}
-- 
2.10.1


^ permalink raw reply related

* [PATCH v5 13/14] nvme: Fix a race condition related to stopping queues
From: Bart Van Assche @ 2016-10-29  0:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

Avoid that nvme_queue_rq() is still running when nvme_stop_queues()
returns.

Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Cc: Keith Busch <keith.busch-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 drivers/nvme/host/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 8403996..fe15d94 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2083,7 +2083,7 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 		queue_flag_set(QUEUE_FLAG_STOPPED, ns->queue);
 		spin_unlock_irq(ns->queue->queue_lock);
 
-		blk_mq_stop_hw_queues(ns->queue);
+		blk_mq_quiesce_queue(ns->queue);
 	}
 	mutex_unlock(&ctrl->namespaces_mutex);
 }
-- 
2.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v5 12/14] SRP transport, scsi-mq: Wait for .queue_rq() if necessary
From: Bart Van Assche @ 2016-10-29  0:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3@sandisk.com>

Ensure that if scsi-mq is enabled that scsi_internal_device_block()
waits until ongoing shost->hostt->queuecommand() calls have finished.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: James Bottomley <jejb@linux.vnet.ibm.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Doug Ledford <dledford@redhat.com>
---
 drivers/scsi/scsi_lib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 3ab9c87..1febc52 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2873,7 +2873,7 @@ scsi_internal_device_block(struct scsi_device *sdev)
 	 * request queue. 
 	 */
 	if (q->mq_ops) {
-		blk_mq_stop_hw_queues(q);
+		blk_mq_quiesce_queue(q);
 	} else {
 		spin_lock_irqsave(q->queue_lock, flags);
 		blk_stop_queue(q);
-- 
2.10.1


^ permalink raw reply related

* [PATCH v5 11/14] SRP transport: Move queuecommand() wait code to SCSI core
From: Bart Van Assche @ 2016-10-29  0:22 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3@sandisk.com>

Additionally, rename srp_wait_for_queuecommand() into
scsi_wait_for_queuecommand() and add a comment about the
queuecommand() call from scsi_send_eh_cmnd().

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: James Bottomley <jejb@linux.vnet.ibm.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Doug Ledford <dledford@redhat.com>
---
 drivers/scsi/scsi_lib.c           | 38 ++++++++++++++++++++++++++++++++++++
 drivers/scsi/scsi_transport_srp.c | 41 ++++++---------------------------------
 2 files changed, 44 insertions(+), 35 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index b4f682c..3ab9c87 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2721,6 +2721,39 @@ void sdev_evt_send_simple(struct scsi_device *sdev,
 EXPORT_SYMBOL_GPL(sdev_evt_send_simple);
 
 /**
+ * scsi_request_fn_active() - number of kernel threads inside scsi_request_fn()
+ * @sdev: SCSI device to count the number of scsi_request_fn() callers for.
+ */
+static int scsi_request_fn_active(struct scsi_device *sdev)
+{
+	struct request_queue *q = sdev->request_queue;
+	int request_fn_active;
+
+	WARN_ON_ONCE(sdev->host->use_blk_mq);
+
+	spin_lock_irq(q->queue_lock);
+	request_fn_active = q->request_fn_active;
+	spin_unlock_irq(q->queue_lock);
+
+	return request_fn_active;
+}
+
+/**
+ * scsi_wait_for_queuecommand() - wait for ongoing queuecommand() calls
+ * @shost: SCSI host pointer.
+ *
+ * Wait until the ongoing shost->hostt->queuecommand() calls that are
+ * invoked from scsi_request_fn() have finished.
+ */
+static void scsi_wait_for_queuecommand(struct scsi_device *sdev)
+{
+	WARN_ON_ONCE(sdev->host->use_blk_mq);
+
+	while (scsi_request_fn_active(sdev))
+		msleep(20);
+}
+
+/**
  *	scsi_device_quiesce - Block user issued commands.
  *	@sdev:	scsi device to quiesce.
  *
@@ -2814,6 +2847,10 @@ EXPORT_SYMBOL(scsi_target_resume);
  *	(which must be a legal transition).  When the device is in this
  *	state, all commands are deferred until the scsi lld reenables
  *	the device with scsi_device_unblock or device_block_tmo fires.
+ *
+ * To do: avoid that scsi_send_eh_cmnd() calls queuecommand() after
+ * scsi_internal_device_block() has blocked a SCSI device and also
+ * remove the rport mutex lock and unlock calls from srp_queuecommand().
  */
 int
 scsi_internal_device_block(struct scsi_device *sdev)
@@ -2841,6 +2878,7 @@ scsi_internal_device_block(struct scsi_device *sdev)
 		spin_lock_irqsave(q->queue_lock, flags);
 		blk_stop_queue(q);
 		spin_unlock_irqrestore(q->queue_lock, flags);
+		scsi_wait_for_queuecommand(sdev);
 	}
 
 	return 0;
diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index e3cd3ec..b48328a 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -24,7 +24,6 @@
 #include <linux/err.h>
 #include <linux/slab.h>
 #include <linux/string.h>
-#include <linux/delay.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
@@ -402,36 +401,6 @@ static void srp_reconnect_work(struct work_struct *work)
 	}
 }
 
-/**
- * scsi_request_fn_active() - number of kernel threads inside scsi_request_fn()
- * @shost: SCSI host for which to count the number of scsi_request_fn() callers.
- *
- * To do: add support for scsi-mq in this function.
- */
-static int scsi_request_fn_active(struct Scsi_Host *shost)
-{
-	struct scsi_device *sdev;
-	struct request_queue *q;
-	int request_fn_active = 0;
-
-	shost_for_each_device(sdev, shost) {
-		q = sdev->request_queue;
-
-		spin_lock_irq(q->queue_lock);
-		request_fn_active += q->request_fn_active;
-		spin_unlock_irq(q->queue_lock);
-	}
-
-	return request_fn_active;
-}
-
-/* Wait until ongoing shost->hostt->queuecommand() calls have finished. */
-static void srp_wait_for_queuecommand(struct Scsi_Host *shost)
-{
-	while (scsi_request_fn_active(shost))
-		msleep(20);
-}
-
 static void __rport_fail_io_fast(struct srp_rport *rport)
 {
 	struct Scsi_Host *shost = rport_to_shost(rport);
@@ -441,14 +410,17 @@ static void __rport_fail_io_fast(struct srp_rport *rport)
 
 	if (srp_rport_set_state(rport, SRP_RPORT_FAIL_FAST))
 		return;
+	/*
+	 * Call scsi_target_block() to wait for ongoing shost->queuecommand()
+	 * calls before invoking i->f->terminate_rport_io().
+	 */
+	scsi_target_block(rport->dev.parent);
 	scsi_target_unblock(rport->dev.parent, SDEV_TRANSPORT_OFFLINE);
 
 	/* Involve the LLD if possible to terminate all I/O on the rport. */
 	i = to_srp_internal(shost->transportt);
-	if (i->f->terminate_rport_io) {
-		srp_wait_for_queuecommand(shost);
+	if (i->f->terminate_rport_io)
 		i->f->terminate_rport_io(rport);
-	}
 }
 
 /**
@@ -576,7 +548,6 @@ int srp_reconnect_rport(struct srp_rport *rport)
 	if (res)
 		goto out;
 	scsi_target_block(&shost->shost_gendev);
-	srp_wait_for_queuecommand(shost);
 	res = rport->state != SRP_RPORT_LOST ? i->f->reconnect(rport) : -ENODEV;
 	pr_debug("%s (state %d): transport.reconnect() returned %d\n",
 		 dev_name(&shost->shost_gendev), rport->state, res);
-- 
2.10.1


^ permalink raw reply related

* [PATCH v5 10/14] dm: Fix a race condition related to stopping and starting queues
From: Bart Van Assche @ 2016-10-29  0:22 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3@sandisk.com>

Ensure that all ongoing dm_mq_queue_rq() and dm_mq_requeue_request()
calls have stopped before setting the "queue stopped" flag. This
allows to remove the "queue stopped" test from dm_mq_queue_rq() and
dm_mq_requeue_request(). This patch fixes a race condition because
dm_mq_queue_rq() is called without holding the queue lock and hence
BLK_MQ_S_STOPPED can be set at any time while dm_mq_queue_rq() is
in progress. This patch prevents that the following hang occurs
sporadically when using dm-mq:

INFO: task systemd-udevd:10111 blocked for more than 480 seconds.
Call Trace:
 [<ffffffff8161f397>] schedule+0x37/0x90
 [<ffffffff816239ef>] schedule_timeout+0x27f/0x470
 [<ffffffff8161e76f>] io_schedule_timeout+0x9f/0x110
 [<ffffffff8161fb36>] bit_wait_io+0x16/0x60
 [<ffffffff8161f929>] __wait_on_bit_lock+0x49/0xa0
 [<ffffffff8114fe69>] __lock_page+0xb9/0xc0
 [<ffffffff81165d90>] truncate_inode_pages_range+0x3e0/0x760
 [<ffffffff81166120>] truncate_inode_pages+0x10/0x20
 [<ffffffff81212a20>] kill_bdev+0x30/0x40
 [<ffffffff81213d41>] __blkdev_put+0x71/0x360
 [<ffffffff81214079>] blkdev_put+0x49/0x170
 [<ffffffff812141c0>] blkdev_close+0x20/0x30
 [<ffffffff811d48e8>] __fput+0xe8/0x1f0
 [<ffffffff811d4a29>] ____fput+0x9/0x10
 [<ffffffff810842d3>] task_work_run+0x83/0xb0
 [<ffffffff8106606e>] do_exit+0x3ee/0xc40
 [<ffffffff8106694b>] do_group_exit+0x4b/0xc0
 [<ffffffff81073d9a>] get_signal+0x2ca/0x940
 [<ffffffff8101bf43>] do_signal+0x23/0x660
 [<ffffffff810022b3>] exit_to_usermode_loop+0x73/0xb0
 [<ffffffff81002cb0>] syscall_return_slowpath+0xb0/0xc0
 [<ffffffff81624e33>] entry_SYSCALL_64_fastpath+0xa6/0xa8

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/dm-rq.c | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 0103031..f9f37ad 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -102,7 +102,7 @@ static void dm_mq_stop_queue(struct request_queue *q)
 	if (blk_mq_queue_stopped(q))
 		return;
 
-	blk_mq_stop_hw_queues(q);
+	blk_mq_quiesce_queue(q);
 }
 
 void dm_stop_queue(struct request_queue *q)
@@ -883,17 +883,6 @@ static int dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
 		dm_put_live_table(md, srcu_idx);
 	}
 
-	/*
-	 * On suspend dm_stop_queue() handles stopping the blk-mq
-	 * request_queue BUT: even though the hw_queues are marked
-	 * BLK_MQ_S_STOPPED at that point there is still a race that
-	 * is allowing block/blk-mq.c to call ->queue_rq against a
-	 * hctx that it really shouldn't.  The following check guards
-	 * against this rarity (albeit _not_ race-free).
-	 */
-	if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
-		return BLK_MQ_RQ_QUEUE_BUSY;
-
 	if (ti->type->busy && ti->type->busy(ti))
 		return BLK_MQ_RQ_QUEUE_BUSY;
 
-- 
2.10.1


^ permalink raw reply related

* [PATCH v5 09/14] dm: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
From: Bart Van Assche @ 2016-10-29  0:22 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3@sandisk.com>

Instead of manipulating both QUEUE_FLAG_STOPPED and BLK_MQ_S_STOPPED
in the dm start and stop queue functions, only manipulate the latter
flag. Change blk_queue_stopped() tests into blk_mq_queue_stopped().

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/md/dm-rq.c | 16 +---------------
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 4b62e74..0103031 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -75,12 +75,6 @@ static void dm_old_start_queue(struct request_queue *q)
 
 static void dm_mq_start_queue(struct request_queue *q)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(q->queue_lock, flags);
-	queue_flag_clear(QUEUE_FLAG_STOPPED, q);
-	spin_unlock_irqrestore(q->queue_lock, flags);
-
 	blk_mq_start_stopped_hw_queues(q, true);
 	blk_mq_kick_requeue_list(q);
 }
@@ -105,16 +99,8 @@ static void dm_old_stop_queue(struct request_queue *q)
 
 static void dm_mq_stop_queue(struct request_queue *q)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(q->queue_lock, flags);
-	if (blk_queue_stopped(q)) {
-		spin_unlock_irqrestore(q->queue_lock, flags);
+	if (blk_mq_queue_stopped(q))
 		return;
-	}
-
-	queue_flag_set(QUEUE_FLAG_STOPPED, q);
-	spin_unlock_irqrestore(q->queue_lock, flags);
 
 	blk_mq_stop_hw_queues(q);
 }
-- 
2.10.1


^ permalink raw reply related

* [PATCH v5 08/14] blk-mq: Add a kick_requeue_list argument to blk_mq_requeue_request()
From: Bart Van Assche @ 2016-10-29  0:21 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

Most blk_mq_requeue_request() and blk_mq_add_to_requeue_list() calls
are followed by kicking the requeue list. Hence add an argument to
these two functions that allows to kick the requeue list. This was
proposed by Christoph Hellwig.

Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Reviewed-by: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>
Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Cc: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>
Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
---
 block/blk-flush.c            |  5 +----
 block/blk-mq.c               | 10 +++++++---
 drivers/block/xen-blkfront.c |  2 +-
 drivers/md/dm-rq.c           |  2 +-
 drivers/nvme/host/core.c     |  2 +-
 drivers/scsi/scsi_lib.c      |  4 +---
 include/linux/blk-mq.h       |  5 +++--
 7 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 3c882cb..7e9950b 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -134,10 +134,7 @@ static void blk_flush_restore_request(struct request *rq)
 static bool blk_flush_queue_rq(struct request *rq, bool add_front)
 {
 	if (rq->q->mq_ops) {
-		struct request_queue *q = rq->q;
-
-		blk_mq_add_to_requeue_list(rq, add_front);
-		blk_mq_kick_requeue_list(q);
+		blk_mq_add_to_requeue_list(rq, add_front, true);
 		return false;
 	} else {
 		if (add_front)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 96015a9..b7753ae 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -494,12 +494,12 @@ static void __blk_mq_requeue_request(struct request *rq)
 	}
 }
 
-void blk_mq_requeue_request(struct request *rq)
+void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list)
 {
 	__blk_mq_requeue_request(rq);
 
 	BUG_ON(blk_queued_rq(rq));
-	blk_mq_add_to_requeue_list(rq, true);
+	blk_mq_add_to_requeue_list(rq, true, kick_requeue_list);
 }
 EXPORT_SYMBOL(blk_mq_requeue_request);
 
@@ -533,7 +533,8 @@ static void blk_mq_requeue_work(struct work_struct *work)
 	blk_mq_run_hw_queues(q, false);
 }
 
-void blk_mq_add_to_requeue_list(struct request *rq, bool at_head)
+void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
+				bool kick_requeue_list)
 {
 	struct request_queue *q = rq->q;
 	unsigned long flags;
@@ -552,6 +553,9 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool at_head)
 		list_add_tail(&rq->queuelist, &q->requeue_list);
 	}
 	spin_unlock_irqrestore(&q->requeue_lock, flags);
+
+	if (kick_requeue_list)
+		blk_mq_kick_requeue_list(q);
 }
 EXPORT_SYMBOL(blk_mq_add_to_requeue_list);
 
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 60fff99..a3e1727 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -2043,7 +2043,7 @@ static int blkif_recover(struct blkfront_info *info)
 		/* Requeue pending requests (flush or discard) */
 		list_del_init(&req->queuelist);
 		BUG_ON(req->nr_phys_segments > segs);
-		blk_mq_requeue_request(req);
+		blk_mq_requeue_request(req, false);
 	}
 	blk_mq_start_stopped_hwqueues(info->rq);
 	blk_mq_kick_requeue_list(info->rq);
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 2b82496..4b62e74 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -347,7 +347,7 @@ EXPORT_SYMBOL(dm_mq_kick_requeue_list);
 
 static void dm_mq_delay_requeue_request(struct request *rq, unsigned long msecs)
 {
-	blk_mq_requeue_request(rq);
+	blk_mq_requeue_request(rq, false);
 	__dm_mq_kick_requeue_list(rq->q, msecs);
 }
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ab5f59e..8403996 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -203,7 +203,7 @@ void nvme_requeue_req(struct request *req)
 {
 	unsigned long flags;
 
-	blk_mq_requeue_request(req);
+	blk_mq_requeue_request(req, false);
 	spin_lock_irqsave(req->q->queue_lock, flags);
 	if (!blk_queue_stopped(req->q))
 		blk_mq_kick_requeue_list(req->q);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 126a784..b4f682c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -86,10 +86,8 @@ scsi_set_blocked(struct scsi_cmnd *cmd, int reason)
 static void scsi_mq_requeue_cmd(struct scsi_cmnd *cmd)
 {
 	struct scsi_device *sdev = cmd->device;
-	struct request_queue *q = cmd->request->q;
 
-	blk_mq_requeue_request(cmd->request);
-	blk_mq_kick_requeue_list(q);
+	blk_mq_requeue_request(cmd->request, true);
 	put_device(&sdev->sdev_gendev);
 }
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index ed20ac7..35a0af5 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -218,8 +218,9 @@ void blk_mq_start_request(struct request *rq);
 void blk_mq_end_request(struct request *rq, int error);
 void __blk_mq_end_request(struct request *rq, int error);
 
-void blk_mq_requeue_request(struct request *rq);
-void blk_mq_add_to_requeue_list(struct request *rq, bool at_head);
+void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list);
+void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
+				bool kick_requeue_list);
 void blk_mq_kick_requeue_list(struct request_queue *q);
 void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long msecs);
 void blk_mq_abort_requeue_list(struct request_queue *q);
-- 
2.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v5 07/14] blk-mq: Introduce blk_mq_quiesce_queue()
From: Bart Van Assche @ 2016-10-29  0:21 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
have finished. This function does *not* wait until all outstanding
requests have finished (this means invocation of request.end_io()).
The algorithm used by blk_mq_quiesce_queue() is as follows:
* Hold either an RCU read lock or an SRCU read lock around
  .queue_rq() calls. The former is used if .queue_rq() does not
  block and the latter if .queue_rq() may block.
* blk_mq_quiesce_queue() first calls blk_mq_stop_hw_queues()
  followed by synchronize_srcu() or synchronize_rcu(). The latter
  call waits for .queue_rq() invocations that started before
  blk_mq_quiesce_queue() was called.
* The blk_mq_hctx_stopped() calls that control whether or not
  .queue_rq() will be called are called with the (S)RCU read lock
  held. This is necessary to avoid race conditions against
  blk_mq_quiesce_queue().

Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Cc: Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>
Cc: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>
---
 block/Kconfig          |  1 +
 block/blk-mq.c         | 71 +++++++++++++++++++++++++++++++++++++++++++++-----
 include/linux/blk-mq.h |  3 +++
 include/linux/blkdev.h |  1 +
 4 files changed, 69 insertions(+), 7 deletions(-)

diff --git a/block/Kconfig b/block/Kconfig
index 1d4d624..0562ef9 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -5,6 +5,7 @@ menuconfig BLOCK
        bool "Enable the block layer" if EXPERT
        default y
        select SBITMAP
+       select SRCU
        help
 	 Provide block layer support for the kernel.
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 534128a..96015a9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -115,6 +115,33 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
 
+/**
+ * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have finished
+ * @q: request queue.
+ *
+ * Note: this function does not prevent that the struct request end_io()
+ * callback function is invoked. Additionally, it is not prevented that
+ * new queue_rq() calls occur unless the queue has been stopped first.
+ */
+void blk_mq_quiesce_queue(struct request_queue *q)
+{
+	struct blk_mq_hw_ctx *hctx;
+	unsigned int i;
+	bool rcu = false;
+
+	blk_mq_stop_hw_queues(q);
+
+	queue_for_each_hw_ctx(q, hctx, i) {
+		if (hctx->flags & BLK_MQ_F_BLOCKING)
+			synchronize_srcu(&hctx->queue_rq_srcu);
+		else
+			rcu = true;
+	}
+	if (rcu)
+		synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
+
 void blk_mq_wake_waiters(struct request_queue *q)
 {
 	struct blk_mq_hw_ctx *hctx;
@@ -768,7 +795,7 @@ static inline unsigned int queued_to_index(unsigned int queued)
  * of IO. In particular, we'd like FIFO behaviour on handling existing
  * items on the hctx->dispatch list. Ignore that for now.
  */
-static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
+static void blk_mq_process_rq_list(struct blk_mq_hw_ctx *hctx)
 {
 	struct request_queue *q = hctx->queue;
 	struct request *rq;
@@ -780,9 +807,6 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 	if (unlikely(blk_mq_hctx_stopped(hctx)))
 		return;
 
-	WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
-		cpu_online(hctx->next_cpu));
-
 	hctx->run++;
 
 	/*
@@ -873,6 +897,24 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 	}
 }
 
+static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
+{
+	int srcu_idx;
+
+	WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
+		cpu_online(hctx->next_cpu));
+
+	if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
+		rcu_read_lock();
+		blk_mq_process_rq_list(hctx);
+		rcu_read_unlock();
+	} else {
+		srcu_idx = srcu_read_lock(&hctx->queue_rq_srcu);
+		blk_mq_process_rq_list(hctx);
+		srcu_read_unlock(&hctx->queue_rq_srcu, srcu_idx);
+	}
+}
+
 /*
  * It'd be great if the workqueue API had a way to pass
  * in a mask and had some smarts for more clever placement.
@@ -1283,7 +1325,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 	const int is_flush_fua = bio->bi_opf & (REQ_PREFLUSH | REQ_FUA);
 	struct blk_map_ctx data;
 	struct request *rq;
-	unsigned int request_count = 0;
+	unsigned int request_count = 0, srcu_idx;
 	struct blk_plug *plug;
 	struct request *same_queue_rq = NULL;
 	blk_qc_t cookie;
@@ -1326,7 +1368,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_mq_bio_to_request(rq, bio);
 
 		/*
-		 * We do limited pluging. If the bio can be merged, do that.
+		 * We do limited plugging. If the bio can be merged, do that.
 		 * Otherwise the existing request in the plug list will be
 		 * issued. So the plug list will have one request at most
 		 */
@@ -1346,7 +1388,16 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_mq_put_ctx(data.ctx);
 		if (!old_rq)
 			goto done;
-		blk_mq_try_issue_directly(data.hctx, old_rq, &cookie);
+
+		if (!(data.hctx->flags & BLK_MQ_F_BLOCKING)) {
+			rcu_read_lock();
+			blk_mq_try_issue_directly(data.hctx, old_rq, &cookie);
+			rcu_read_unlock();
+		} else {
+			srcu_idx = srcu_read_lock(&data.hctx->queue_rq_srcu);
+			blk_mq_try_issue_directly(data.hctx, old_rq, &cookie);
+			srcu_read_unlock(&data.hctx->queue_rq_srcu, srcu_idx);
+		}
 		goto done;
 	}
 
@@ -1625,6 +1676,9 @@ static void blk_mq_exit_hctx(struct request_queue *q,
 	if (set->ops->exit_hctx)
 		set->ops->exit_hctx(hctx, hctx_idx);
 
+	if (hctx->flags & BLK_MQ_F_BLOCKING)
+		cleanup_srcu_struct(&hctx->queue_rq_srcu);
+
 	blk_mq_remove_cpuhp(hctx);
 	blk_free_flush_queue(hctx->fq);
 	sbitmap_free(&hctx->ctx_map);
@@ -1705,6 +1759,9 @@ static int blk_mq_init_hctx(struct request_queue *q,
 				   flush_start_tag + hctx_idx, node))
 		goto free_fq;
 
+	if (hctx->flags & BLK_MQ_F_BLOCKING)
+		init_srcu_struct(&hctx->queue_rq_srcu);
+
 	return 0;
 
  free_fq:
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index a85a20f..ed20ac7 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -3,6 +3,7 @@
 
 #include <linux/blkdev.h>
 #include <linux/sbitmap.h>
+#include <linux/srcu.h>
 
 struct blk_mq_tags;
 struct blk_flush_queue;
@@ -35,6 +36,8 @@ struct blk_mq_hw_ctx {
 
 	struct blk_mq_tags	*tags;
 
+	struct srcu_struct	queue_rq_srcu;
+
 	unsigned long		queued;
 	unsigned long		run;
 #define BLK_MQ_MAX_DISPATCH_ORDER	7
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c47c358..8259d87 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -824,6 +824,7 @@ extern void __blk_run_queue(struct request_queue *q);
 extern void __blk_run_queue_uncond(struct request_queue *q);
 extern void blk_run_queue(struct request_queue *);
 extern void blk_run_queue_async(struct request_queue *q);
+extern void blk_mq_quiesce_queue(struct request_queue *q);
 extern int blk_rq_map_user(struct request_queue *, struct request *,
 			   struct rq_map_data *, void __user *, unsigned long,
 			   gfp_t);
-- 
2.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox