Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* Re: [PATCH v5 14/14] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
From: Bart Van Assche @ 2016-10-31 15:10 UTC (permalink / raw)
  To: loberman@redhat.com
  Cc: linux-block@vger.kernel.org, tom.leiming@gmail.com,
	linux-rdma@vger.kernel.org, snitzer@redhat.com, hch@lst.de,
	martin.petersen@oracle.com, konrad.wilk@oracle.com,
	roger.pau@citrix.com, axboe@fb.com, linux-scsi@vger.kernel.org,
	linux-nvme@lists.infradead.org, jejb@linux.vnet.ibm.com,
	keith.busch@intel.com, dledford@redhat.com
In-Reply-To: <540193784.5466628.1477921998345.JavaMail.zimbra@redhat.com>

On Mon, 2016-10-31 at 09:53 -0400, Laurence Oberman wrote:
> Applied all 14 patches, also corrected the part of the xen-blkfront.c 
> blkif_recover patch in patchv5-5/14.
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-
> blkfront.c
> index 9908597..60fff99 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -2045,6 +2045,7 @@ static int blkif_recover(struct blkfront_info
> *info)
>                  BUG_ON(req->nr_phys_segments > segs);
>                  blk_mq_requeue_request(req);
>          }
> +        blk_mq_start_stopped_hw_queues(infrq,
> true);                    *** Corrected
>          blk_mq_kick_requeue_list(infrq);
>  
>          while ((bio = bio_list_pop(&infbio_list)) != NULL) {

Hello Laurence,

Sorry for the build failure. The way you changed xen-blkfront is indeed
what I intended. Apparently I forgot to enable Xen in my kernel config
...

Bart.

^ permalink raw reply

* Re: RDMA developer gatherings around Kernel Summit and Linux Plumbers in Santa Fe
From: Or Gerlitz @ 2016-10-31 15:11 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Doug Ledford,
	skc-YOWKrPYUwWM, Weiny, Ira, Jason Gunthorpe, John Fleck,
	Leon Romanovsky, Liran Liss, Matan Barak, Tzahi Oved
In-Reply-To: <alpine.DEB.2.20.1610301848560.15769-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>

On Mon, Oct 31, 2016 at 1:49 AM, Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org> wrote:

> If you were invited to the KS or LPC then you can attend the RDMA
> workshop.

AFAIK LPC is by registration, not by invitation, so all LPC attendees
can come, right?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: RDMA developer gatherings around Kernel Summit and Linux Plumbers in Santa Fe
From: Christoph Lameter @ 2016-10-31 15:33 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Doug Ledford,
	skc-YOWKrPYUwWM, Weiny, Ira, Jason Gunthorpe, John Fleck,
	Leon Romanovsky, Liran Liss, Matan Barak, Tzahi Oved
In-Reply-To: <CAJ3xEMh-Q_6tRV674vhNE5o4DYiECbpSEGL_au-jjTM4bUt0Lw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>


On Mon, 31 Oct 2016, Or Gerlitz wrote:

> On Mon, Oct 31, 2016 at 1:49 AM, Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org> wrote:
>
> > If you were invited to the KS or LPC then you can attend the RDMA
> > workshop.
>
> AFAIK LPC is by registration, not by invitation, so all LPC attendees
> can come, right?
>

Correct. There was a limit on the number of registrations and a number of
additional spots were available with a special inviate.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: iscsi_trx going into D state
From: Robert LeBlanc @ 2016-10-31 16:34 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Zhu Lingshan, linux-rdma, linux-scsi-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1477780190.22703.47.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>

Nicholas,

Thanks for following up on this. We have been chasing other bugs in
our provisioning and as such has reduced our load on the boxes. We are
hoping to get that all straightened out this week and do some more
testing. So far we have not had any iSCSI in D state since the patch,
be we haven't been able to test it well either. We will keep you
updated.

Thank you,
Robert LeBlanc
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Sat, Oct 29, 2016 at 4:29 PM, Nicholas A. Bellinger
<nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org> wrote:
> Hi Robert,
>
> On Wed, 2016-10-19 at 10:41 -0600, Robert LeBlanc wrote:
>> Nicholas,
>>
>> I didn't have high hopes for the patch because we were not seeing
>> TMR_ABORT_TASK (or 'abort') in dmesg or /var/log/messages, but it
>> seemed to help regardless. Our clients finally OOMed from the hung
>> sessions, so we are having to reboot them and we will do some more
>> testing. We haven't put the updated kernel on our clients yet. Our
>> clients have iSCSI root disks so I'm not sure if we can get a vmcore
>> on those, but we will do what we can to get you a vmcore from the
>> target if it happens again.
>>
>
> Just checking in to see if you've observed further issues with
> iser-target ports, and/or able to generate a crashdump with v4.4.y..?
>
>> As far as our configuration: It is a superMicro box with 6 SAMSUNG
>> MZ7LM3T8HCJM-00005 SSDs. Two are for root and four are in mdadm
>> RAID-10 for exporting via iSCSI/iSER. We have ZFS on top of the
>> RAID-10 for checksum and snapshots only and we export ZVols to the
>> clients (one or more per VM on the client). We do not persist the
>> export info (targetcli saveconfig), but regenerate it from scripts.
>> The client receives two or more of these exports and puts them in a
>> RAID-1 device. The exports are served by iSER one one port and also by
>> normal iSCSI on a different port for compatibility, but not normally
>> used. If you need more info about the config, please let me know. It
>> was kind of a vague request so I'm not sure what exactly is important
>> to you.
>
> Thanks for the extra details of your hardware + user-space
> configuration.
>
>> Thanks for helping us with this,
>> Robert LeBlanc
>>
>> When we have problems, we usually see this in the logs:
>> Oct 17 08:57:50 prv-0-12-sanstack kernel: iSCSI Login timeout on
>> Network Portal 0.0.0.0:3260
>> Oct 17 08:57:50 prv-0-12-sanstack kernel: Unexpected ret: -104 send data 48
>> Oct 17 08:57:50 prv-0-12-sanstack kernel: tx_data returned -32, expecting 48.
>> Oct 17 08:57:50 prv-0-12-sanstack kernel: iSCSI Login negotiation failed.
>>
>> I found some backtraces in the logs, not sure if this is helpful, this
>> is before your patch (your patch booted at Oct 18 10:36:59):
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: INFO: rcu_sched
>> self-detected stall on CPU
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: #0115-...: (41725 ticks this
>> GP) idle=b59/140000000000001/0 softirq=535/535 fqs=30992
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: #011 (t=42006 jiffies g=1550
>> c=1549 q=0)
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: Task dump for CPU 5:
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: kworker/u68:2   R  running
>> task        0 17967      2 0x00000008
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: Workqueue: isert_comp_wq
>> isert_cq_work [ib_isert]
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: ffff883f4c0dca80
>> 00000000af8ca7a4 ffff883f7fb43da8 ffffffff810ac83f
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: 0000000000000005
>> ffffffff81adb680 ffff883f7fb43dc0 ffffffff810af179
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: 0000000000000006
>> ffff883f7fb43df0 ffffffff810e1c10 ffff883f7fb57b80
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: Call Trace:
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: <IRQ>  [<ffffffff810ac83f>]
>> sched_show_task+0xaf/0x110
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810af179>]
>> dump_cpu_task+0x39/0x40
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810e1c10>]
>> rcu_dump_cpu_stacks+0x80/0xb0
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810e6040>]
>> rcu_check_callbacks+0x540/0x820
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810afd51>] ?
>> account_system_time+0x81/0x110
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810fa9a0>] ?
>> tick_sched_do_timer+0x50/0x50
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810eb4d9>]
>> update_process_times+0x39/0x60
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810fa755>]
>> tick_sched_handle.isra.17+0x25/0x60
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810fa9dd>]
>> tick_sched_timer+0x3d/0x70
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810ec0c2>]
>> __hrtimer_run_queues+0x102/0x290
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810ec5a8>]
>> hrtimer_interrupt+0xa8/0x1a0
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff81052c65>]
>> local_apic_timer_interrupt+0x35/0x60
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8172343d>]
>> smp_apic_timer_interrupt+0x3d/0x50
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff817216f7>]
>> apic_timer_interrupt+0x87/0x90
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: <EOI>  [<ffffffff810d70fe>]
>> ? console_unlock+0x41e/0x4e0
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810d74bc>]
>> vprintk_emit+0x2fc/0x500
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810d783f>]
>> vprintk_default+0x1f/0x30
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff81174c2a>] printk+0x5d/0x74
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff814bc351>]
>> transport_lookup_cmd_lun+0x1d1/0x200
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff814edcf0>]
>> iscsit_setup_scsi_cmd+0x230/0x540
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffffa0890bf3>]
>> isert_rx_do_work+0x3f3/0x7f0 [ib_isert]
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffffa0891174>]
>> isert_cq_work+0x184/0x770 [ib_isert]
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8109734f>]
>> process_one_work+0x14f/0x400
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff81097bc4>]
>> worker_thread+0x114/0x470
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8171c55a>] ?
>> __schedule+0x34a/0x7f0
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff81097ab0>] ?
>> rescuer_thread+0x310/0x310
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8109d708>] kthread+0xd8/0xf0
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8109d630>] ?
>> kthread_park+0x60/0x60
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff81720c8f>]
>> ret_from_fork+0x3f/0x70
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8109d630>] ?
>> kthread_park+0x60/0x60
>>
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: INFO: rcu_sched
>> self-detected stall on CPU
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: #01128-...: (5999 ticks this
>> GP) idle=2f9/140000000000001/0 softirq=457/457 fqs=4830
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: #011 (t=6000 jiffies g=3546
>> c=3545 q=0)
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: Task dump for CPU 28:
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: iscsi_np        R  running
>> task        0 16597      2 0x0000000c
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: ffff887f40350000
>> 00000000b98a67bb ffff887f7f503da8 ffffffff810ac8ff
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: 000000000000001c
>> ffffffff81adb680 ffff887f7f503dc0 ffffffff810af239
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: 000000000000001d
>> ffff887f7f503df0 ffffffff810e1cd0 ffff887f7f517b80
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: Call Trace:
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: <IRQ>  [<ffffffff810ac8ff>]
>> sched_show_task+0xaf/0x110
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810af239>]
>> dump_cpu_task+0x39/0x40
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810e1cd0>]
>> rcu_dump_cpu_stacks+0x80/0xb0
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810e6100>]
>> rcu_check_callbacks+0x540/0x820
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810afe11>] ?
>> account_system_time+0x81/0x110
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810faa60>] ?
>> tick_sched_do_timer+0x50/0x50
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810eb599>]
>> update_process_times+0x39/0x60
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810fa815>]
>> tick_sched_handle.isra.17+0x25/0x60
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810faa9d>]
>> tick_sched_timer+0x3d/0x70
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810ec182>]
>> __hrtimer_run_queues+0x102/0x290
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810ec668>]
>> hrtimer_interrupt+0xa8/0x1a0
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff81052c65>]
>> local_apic_timer_interrupt+0x35/0x60
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff81723cbd>]
>> smp_apic_timer_interrupt+0x3d/0x50
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff81721f77>]
>> apic_timer_interrupt+0x87/0x90
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: <EOI>  [<ffffffff810d71be>]
>> ? console_unlock+0x41e/0x4e0
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810d757c>]
>> vprintk_emit+0x2fc/0x500
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810d78ff>]
>> vprintk_default+0x1f/0x30
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff81174dde>] printk+0x5d/0x74
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff814e71ad>]
>> iscsi_target_locate_portal+0x62d/0x6f0
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff814e5100>]
>> iscsi_target_login_thread+0x6f0/0xfc0
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff814e4a10>] ?
>> iscsi_target_login_sess_out+0x250/0x250
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff8109d7c8>] kthread+0xd8/0xf0
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff8109d6f0>] ?
>> kthread_park+0x60/0x60
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff8172150f>]
>> ret_from_fork+0x3f/0x70
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff8109d6f0>] ?
>> kthread_park+0x60/0x60
>>
>> I don't think this one is related, but it happened a couple of times:
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: INFO: rcu_sched
>> self-detected stall on CPU
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: #01119-...: (5999 ticks this
>> GP) idle=727/140000000000001/0 softirq=1346/1346 fqs=4990
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: #011 (t=6000 jiffies g=4295
>> c=4294 q=0)
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: Task dump for CPU 19:
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: kworker/19:1    R  running
>> task        0   301      2 0x00000008
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: Workqueue:
>> events_power_efficient fb_flashcursor
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: ffff883f6009ca80
>> 00000000010a7cdd ffff883f7fcc3da8 ffffffff810ac8ff
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: 0000000000000013
>> ffffffff81adb680 ffff883f7fcc3dc0 ffffffff810af239
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: 0000000000000014
>> ffff883f7fcc3df0 ffffffff810e1cd0 ffff883f7fcd7b80
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: Call Trace:
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: <IRQ>  [<ffffffff810ac8ff>]
>> sched_show_task+0xaf/0x110
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810af239>]
>> dump_cpu_task+0x39/0x40
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810e1cd0>]
>> rcu_dump_cpu_stacks+0x80/0xb0
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810e6100>]
>> rcu_check_callbacks+0x540/0x820
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810afe11>] ?
>> account_system_time+0x81/0x110
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810faa60>] ?
>> tick_sched_do_timer+0x50/0x50
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810eb599>]
>> update_process_times+0x39/0x60
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810fa815>]
>> tick_sched_handle.isra.17+0x25/0x60
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810faa9d>]
>> tick_sched_timer+0x3d/0x70
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810ec182>]
>> __hrtimer_run_queues+0x102/0x290
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810ec668>]
>> hrtimer_interrupt+0xa8/0x1a0
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff81052c65>]
>> local_apic_timer_interrupt+0x35/0x60
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff81723cbd>]
>> smp_apic_timer_interrupt+0x3d/0x50
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff81721f77>]
>> apic_timer_interrupt+0x87/0x90
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: <EOI>  [<ffffffff810d71be>]
>> ? console_unlock+0x41e/0x4e0
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff813866ad>]
>> fb_flashcursor+0x5d/0x140
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8138bc00>] ?
>> bit_clear+0x110/0x110
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8109740f>]
>> process_one_work+0x14f/0x400
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff81097c84>]
>> worker_thread+0x114/0x470
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8171cdda>] ?
>> __schedule+0x34a/0x7f0
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff81097b70>] ?
>> rescuer_thread+0x310/0x310
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8109d7c8>] kthread+0xd8/0xf0
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8109d6f0>] ?
>> kthread_park+0x60/0x60
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8172150f>]
>> ret_from_fork+0x3f/0x70
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8109d6f0>] ?
>> kthread_park+0x60/0x60
>
> RCU self-detected schedule stalls typically mean some code is
> monopolizing execution on a specific CPU for an extended period of time
> (eg: endless loop), preventing normal RCU grace-period callbacks from
> running in a timely manner.
>
> It's hard to tell without more log context and/or crashdump what was
> going on here.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 2/2] mm: remove get_user_pages_locked()
From: Paolo Bonzini @ 2016-10-31 17:55 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-mm, Linus Torvalds, Michal Hocko, Jan Kara, Hugh Dickins,
	Dave Hansen, Rik van Riel, Mel Gorman, Andrew Morton,
	linux-kernel, linux-cris-kernel, linux-ia64, dri-devel,
	linux-rdma, kvm, linux-media, devel
In-Reply-To: <20161031134849.GA13609@lucifer>



On 31/10/2016 14:48, Lorenzo Stoakes wrote:
> On Mon, Oct 31, 2016 at 12:45:36PM +0100, Paolo Bonzini wrote:
>>
>>
>> On 31/10/2016 11:02, Lorenzo Stoakes wrote:
>>> - *
>>> - * get_user_pages should be phased out in favor of
>>> - * get_user_pages_locked|unlocked or get_user_pages_fast. Nothing
>>> - * should use get_user_pages because it cannot pass
>>> - * FAULT_FLAG_ALLOW_RETRY to handle_mm_fault.
>>
>> This comment should be preserved in some way.  In addition, removing
> 
> Could you clarify what you think should be retained?
> 
> The comment seems to me to be largely rendered redundant by this change -
> get_user_pages() now offers identical behaviour, and of course the latter part
> of the comment ('because it cannot pass FAULT_FLAG_ALLOW_RETRY') is rendered
> incorrect by this change.
> 
> It could be replaced with a recommendation to make use of VM_FAULT_RETRY logic
> if possible.

Yes, exactly.  locked == NULL should be avoided whenever mmap_sem can 
be dropped, and the comment indirectly said so.  Now most of those cases
actually are those where you can just call get_user_pages_unlocked.

>> get_user_pages_locked() makes it harder (compared to a simple "git grep
>> -w") to identify callers that lack allow-retry functionality).  So I'm
>> not sure about the benefits of these patches.
> 
> That's a fair point, and certainly this cleanup series is less obviously helpful
> than previous ones.
> 
> However, there are a few points in its favour:
> 
> 1. get_user_pages_remote() was the mirror of get_user_pages() prior to adding an
>    int *locked parameter to the former (necessary to allow for the unexport of
>    __get_user_pages_unlocked()), differing only in task/mm being specified and
>    FOLL_REMOTE being set. This patch series keeps these functions 'synchronised'
>    in this respect.
> 
> 2. There is currently only one caller of get_user_pages_locked() in
>    mm/frame_vector.c which seems to suggest this function isn't widely
>    used/known.

Or not widely necessary. :)

> 3. This change results in all slow-path get_user_pages*() functions having the
>    ability to use VM_FAULT_RETRY logic rather than 'defaulting' to using
>    get_user_pages() that doesn't let you do this even if you wanted to.

This is only true if someone does the work though.  From a quick look 
at your series, the following files can be trivially changed to use 
get_user_pages_unlocked:

- drivers/gpu/drm/via/via_dmablit.c
- drivers/platform/goldfish/goldfish_pipe.c
- drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
- drivers/rapidio/devices/rio_mport_cdev.c
- drivers/virt/fsl_hypervisor.c

Also, videobuf_dma_init_user could be changed to use retry by adding a 
*locked argument to videobuf_dma_init_user_locked, prototype patch 
after my signature.

Everything else is probably best kept using get_user_pages.

> 4. It's somewhat confusing/redundant having both get_user_pages_locked() and
>    get_user_pages() functions which both require mmap_sem to be held (i.e. both
>    are 'locked' versions.)
> 
>> If all callers were changed, then sure removing the _locked suffix would
>> be a good idea.
> 
> It seems many callers cannot release mmap_sem so VM_FAULT_RETRY logic couldn't
> happen anyway in this cases and in these cases we couldn't change the caller.

But then get_user_pages_locked remains a less-common case, so perhaps 
it's a good thing to give it a longer name!

> Overall, an alternative here might be to remove get_user_pages() and update
> get_user_pages_locked() to add a 'vmas' parameter, however doing this renders
> get_user_pages_unlocked() asymmetric as it would lack an vmas parameter (adding
> one there would make no sense as VM_FAULT_RETRY logic invalidates VMAs) though
> perhaps not such a big issue as it makes sense as to why this is the case.

Adding the 'vmas' parameter to get_user_pages_locked would make little 
sense.  Since VM_FAULT_RETRY invalidates it and g_u_p_locked can and 
does retry, it would generally not be useful.

So I think we have the right API now:

- do not have lock?  Use get_user_pages_unlocked, get retry for free,
no need to handle  mmap_sem and the locked argument; cannot get back vmas.

- have and cannot drop lock?  User get_user_pages, no need to pass NULL 
for the locked argument; can get back vmas.

- have but can drop lock (rare case)?  Use get_user_pages_locked, 
cannot get back vams.

Paolo

> get_user_pages_unlocked() definitely seems to be a useful helper and therefore
> makes sense to retain.

> Of course another alternative is to leave things be :)
> 

diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c
index 1db0af6c7f94..dae4eb8e9d5b 100644
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -152,7 +152,8 @@ static void videobuf_dma_init(struct videobuf_dmabuf *dma)
 }
 
 static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma,
-			int direction, unsigned long data, unsigned long size)
+			int direction, unsigned long data, unsigned long size,
+			int *locked)
 {
 	unsigned long first, last;
 	int err, rw = 0;
@@ -185,8 +186,17 @@ static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma,
 	dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n",
 		data, size, dma->nr_pages);
 
-	err = get_user_pages(data & PAGE_MASK, dma->nr_pages,
-			     flags, dma->pages, NULL);
+	if (locked && !*locked) {
+		down_read(&current->mm->mmap_sem);
+		*locked = 1;
+	}
+
+	/*
+	 * If the caller cannot have mmap_sem dropped, it passes locked == NULL
+	 * so get_user_pages_locked will not release it.
+	 */
+	err = get_user_pages_locked(data & PAGE_MASK, dma->nr_pages,
+				    flags, dma->pages, locked);
 
 	if (err != dma->nr_pages) {
 		dma->nr_pages = (err >= 0) ? err : 0;
@@ -200,10 +210,11 @@ static int videobuf_dma_init_user(struct videobuf_dmabuf *dma, int direction,
 			   unsigned long data, unsigned long size)
 {
 	int ret;
+	int locked = 0;
 
-	down_read(&current->mm->mmap_sem);
-	ret = videobuf_dma_init_user_locked(dma, direction, data, size);
-	up_read(&current->mm->mmap_sem);
+	ret = videobuf_dma_init_user_locked(dma, direction, data, size, &locked);
+	if (locked)
+		up_read(&current->mm->mmap_sem);
 
 	return ret;
 }
@@ -540,7 +551,7 @@ static int __videobuf_iolock(struct videobuf_queue *q,
 
 			err = videobuf_dma_init_user_locked(&mem->dma,
 						      DMA_FROM_DEVICE,
-						      vb->baddr, vb->bsize);
+						      vb->baddr, vb->bsize, NULL);
 			if (0 != err)
 				return err;
 		}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* Re: [PATCH 2/2] mm: remove get_user_pages_locked()
From: Lorenzo Stoakes @ 2016-10-31 19:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-mm, Linus Torvalds, Michal Hocko, Jan Kara, Hugh Dickins,
	Dave Hansen, Rik van Riel, Mel Gorman, Andrew Morton,
	linux-kernel, linux-cris-kernel, linux-ia64, dri-devel,
	linux-rdma, kvm, linux-media, devel
In-Reply-To: <ddbe34d0-5d29-abce-1627-f464635bbfe6@redhat.com>

On Mon, Oct 31, 2016 at 06:55:33PM +0100, Paolo Bonzini wrote:
> > 2. There is currently only one caller of get_user_pages_locked() in
> >    mm/frame_vector.c which seems to suggest this function isn't widely
> >    used/known.
>
> Or not widely necessary. :)

Well, quite :)
>
> > 3. This change results in all slow-path get_user_pages*() functions having the
> >    ability to use VM_FAULT_RETRY logic rather than 'defaulting' to using
> >    get_user_pages() that doesn't let you do this even if you wanted to.
>
> This is only true if someone does the work though.  From a quick look
> at your series, the following files can be trivially changed to use
> get_user_pages_unlocked:
>
> - drivers/gpu/drm/via/via_dmablit.c
> - drivers/platform/goldfish/goldfish_pipe.c
> - drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
> - drivers/rapidio/devices/rio_mport_cdev.c
> - drivers/virt/fsl_hypervisor.c
>

Ah indeed, I rather glossed through the callers and noticed that at least some
held locks long enough or were called with a lock held sufficient that I wasn't
sure it could be released.

> Also, videobuf_dma_init_user could be changed to use retry by adding a
> *locked argument to videobuf_dma_init_user_locked, prototype patch
> after my signature.
>

Very nice!

> Everything else is probably best kept using get_user_pages.
>
> > 4. It's somewhat confusing/redundant having both get_user_pages_locked() and
> >    get_user_pages() functions which both require mmap_sem to be held (i.e. both
> >    are 'locked' versions.)
> >
> >> If all callers were changed, then sure removing the _locked suffix would
> >> be a good idea.
> >
> > It seems many callers cannot release mmap_sem so VM_FAULT_RETRY logic couldn't
> > happen anyway in this cases and in these cases we couldn't change the caller.
>
> But then get_user_pages_locked remains a less-common case, so perhaps
> it's a good thing to give it a longer name!

My (somewhat minor) concern was that there would be confusion due to the
existence of the triumvirate of g_u_p()/g_u_p_unlocked()/g_u_p_locked(), however
the comments do a decent enough job of explaining the situation.

>
> > Overall, an alternative here might be to remove get_user_pages() and update
> > get_user_pages_locked() to add a 'vmas' parameter, however doing this renders
> > get_user_pages_unlocked() asymmetric as it would lack an vmas parameter (adding
> > one there would make no sense as VM_FAULT_RETRY logic invalidates VMAs) though
> > perhaps not such a big issue as it makes sense as to why this is the case.
>
> Adding the 'vmas' parameter to get_user_pages_locked would make little
> sense.  Since VM_FAULT_RETRY invalidates it and g_u_p_locked can and
> does retry, it would generally not be useful.

I meant only in the case where we'd remove get_user_pages() and instead be left
with get_user_pages_[un]locked() only, meaning we'd have to add some means of
retrieving vmas if locked was set NULL, of course in cases where locked is not
NULL it makes no sense to add it.

>
> So I think we have the right API now:
>
> - do not have lock?  Use get_user_pages_unlocked, get retry for free,
> no need to handle  mmap_sem and the locked argument; cannot get back vmas.
>
> - have and cannot drop lock?  User get_user_pages, no need to pass NULL
> for the locked argument; can get back vmas.
>
> - have but can drop lock (rare case)?  Use get_user_pages_locked,
> cannot get back vams.

Yeah I think this is sane as it is actually, this patch set was a lot less
convincing of a cleanup than prior ones and overall it seems we are better off
with the existing API.

I wonder whether a better patch series to come out of this would be to find
cases where the lock could be dropped (i.e. the ones you mention above) and to
switch to using get_user_pages_[un]locked() where it makes sense to.

I am happy to look into these cases (though of course I must leave your
suggested patch here to you :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [RFC ABI V5 02/10] RDMA/core: Add support for custom types
From: Matan Barak @ 2016-10-31 22:58 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Doug Ledford, Jason Gunthorpe, Christoph Lameter, Liran Liss,
	Haggai Eran, Majd Dibbiny, Tal Alon, Leon Romanovsky
In-Reply-To: <1828884A29C6694DAF28B7E6B8A82373AB0A47BD-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>

On Sun, Oct 30, 2016 at 9:28 PM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> I found this patch very hard to follow.  This was in part due to the output of the patch command itself, but also because there lacked sufficient documentation on what the new data structures were for and the terms being used.  As a result, I had to bounce around the patch to figure things out, adding comments as I went along, until I finally just gave up trying to read it.

Actually, there are some helpful slides in the OFVWG presentations. I
guess it would be best to clarify the model in the commit message.

>
>> The new ioctl infrastructure supports driver specific objects.
>> Each such object type has a free function, allocation size and an
>
> You can replace the allocation size with an alloc function, to pair with the free call.  Then the object can be initialized by the user.
>

I had thought about that, but the user could initialize its part of
the object in the function handler. It can't allocate the object as we
need it in order to allocate an IDR entry and co. The assumption here
is that the "unlock" stage can't fail.

>> order of destruction. This information is embedded in the same
>> table describing the various action allowed on the object, similarly
>> to object oriented programming.
>>
>> When a ucontext is created, a new list is created in this ib_ucontext.
>> This list contains all objects created under this ib_ucontext.
>> When a ib_ucontext is destroyed, we traverse this list several time
>> destroying the various objects by the order mentioned in the object
>> type description. If few object types have the same destruction order,
>> they are destroyed in an order opposite to their creation order.
>
> Could we simply walk the list backwards, destroying all objects with a reference count of 1 - repeat if necessary?  Basically avoid complex rules for this.
>

That's problematic in the MW case. A MW could be disassociated from
its MR by a remote peer. The kernel can't follow that.

> In fact, it would be great if we could just cleanup the list in the reverse order that items were created.  Maybe this requires supporting a pre-cleanup handler, so that the driver can pluck items out of the list that may need to be destroyed out of order.
>

So that's essentially one layer of ordering. Why do you consider a
driver iterating over all objects simpler than this model?

>> Adding an object is done in two parts.
>> First, an object is allocated and added to IDR/fd table. Then, the
>> command's handlers (in downstream patches) could work on this object
>> and fill in its required details.
>> After a successful command, ib_uverbs_uobject_enable is called and
>> this user objects becomes ucontext visible.
>
> If you have a way to mark that an object is used for exclusive access, you may be able to use that instead of introducing a new variable.  (I.e. acquire the object's write lock).  I think we want to make an effort to minimize the size of the kernel structure needed to track every user space object (within reason).
>

I didn't really follow. A command attribute states the nature of the
locking (for example, in MODIFY_QP the QP could be exclusively locked,
but in QUERY_QP it's only locked for reading). I don't want to really
grab a lock, as if I were I could face a dead-lock (user-space could
pass parameters in a colliding order), It could be solved by sorting
the handles, but that would degrade performance without a good reasob.

>> Removing an uboject is done by calling ib_uverbs_uobject_remove.
>>
>> We should make sure IDR (per-device) and list (per-ucontext) could
>> be accessed concurrently without corrupting them.
>>
>> Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> ---
>
> As a general comment, I do have concerns that the resulting generalized parsing of everything will negatively impact performance for operations that do have to transition into the kernel.  Not all devices offload all operations to user space.  Plus the resulting code is extremely difficult to read and non-trivial to use.  It's equivalent to reading C++ code that has 4 layers of inheritance with overrides to basic operators...

There are two parts here. I think the handlers themselves are simpler,
easier to read and less error-prone. They contain less code
duplications. The macro based define language explicitly declare all
attributes, their types, size, etc.
The model here is a bit more complex as we want to achieve both code
resue and add/override of new types/actions/attributes.


>
> Pre and post operators per command that can do straightforward validation seem like a better option.
>
>

I think that would duplicate a lot of code and will be more
error-prone than one infrastrucutre that automates all that work for
you.

>>  drivers/infiniband/core/Makefile      |   3 +-
>>  drivers/infiniband/core/device.c      |   1 +
>>  drivers/infiniband/core/rdma_core.c   | 489
>> ++++++++++++++++++++++++++++++++++
>>  drivers/infiniband/core/rdma_core.h   |  75 ++++++
>>  drivers/infiniband/core/uverbs.h      |   1 +
>>  drivers/infiniband/core/uverbs_main.c |   2 +-
>>  include/rdma/ib_verbs.h               |  28 +-
>>  include/rdma/uverbs_ioctl.h           | 195 ++++++++++++++
>>  8 files changed, 789 insertions(+), 5 deletions(-)
>>  create mode 100644 drivers/infiniband/core/rdma_core.c
>>  create mode 100644 drivers/infiniband/core/rdma_core.h
>>  create mode 100644 include/rdma/uverbs_ioctl.h
>>
>> diff --git a/drivers/infiniband/core/Makefile
>> b/drivers/infiniband/core/Makefile
>> index edaae9f..1819623 100644
>> --- a/drivers/infiniband/core/Makefile
>> +++ b/drivers/infiniband/core/Makefile
>> @@ -28,4 +28,5 @@ ib_umad-y :=                        user_mad.o
>>
>>  ib_ucm-y :=                  ucm.o
>>
>> -ib_uverbs-y :=                       uverbs_main.o uverbs_cmd.o
>> uverbs_marshall.o
>> +ib_uverbs-y :=                       uverbs_main.o uverbs_cmd.o
>> uverbs_marshall.o \
>> +                             rdma_core.o
>> diff --git a/drivers/infiniband/core/device.c
>> b/drivers/infiniband/core/device.c
>> index c3b68f5..43994b1 100644
>> --- a/drivers/infiniband/core/device.c
>> +++ b/drivers/infiniband/core/device.c
>> @@ -243,6 +243,7 @@ struct ib_device *ib_alloc_device(size_t size)
>>       spin_lock_init(&device->client_data_lock);
>>       INIT_LIST_HEAD(&device->client_data_list);
>>       INIT_LIST_HEAD(&device->port_list);
>> +     INIT_LIST_HEAD(&device->type_list);
>>
>>       return device;
>>  }
>> diff --git a/drivers/infiniband/core/rdma_core.c
>> b/drivers/infiniband/core/rdma_core.c
>> new file mode 100644
>> index 0000000..337abc2
>> --- /dev/null
>> +++ b/drivers/infiniband/core/rdma_core.c
>> @@ -0,0 +1,489 @@
>> +/*
>> + * Copyright (c) 2016, Mellanox Technologies inc.  All rights
>> reserved.
>> + *
>> + * This software is available to you under a choice of one of two
>> + * licenses.  You may choose to be licensed under the terms of the GNU
>> + * General Public License (GPL) Version 2, available from the file
>> + * COPYING in the main directory of this source tree, or the
>> + * OpenIB.org BSD license below:
>> + *
>> + *     Redistribution and use in source and binary forms, with or
>> + *     without modification, are permitted provided that the following
>> + *     conditions are met:
>> + *
>> + *      - Redistributions of source code must retain the above
>> + *        copyright notice, this list of conditions and the following
>> + *        disclaimer.
>> + *
>> + *      - Redistributions in binary form must reproduce the above
>> + *        copyright notice, this list of conditions and the following
>> + *        disclaimer in the documentation and/or other materials
>> + *        provided with the distribution.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
>> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
>> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
>> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
>> + * SOFTWARE.
>> + */
>> +
>> +#include <linux/file.h>
>> +#include <linux/anon_inodes.h>
>> +#include <rdma/ib_verbs.h>
>> +#include "uverbs.h"
>> +#include "rdma_core.h"
>> +#include <rdma/uverbs_ioctl.h>
>> +
>> +const struct uverbs_type *uverbs_get_type(const struct ib_device
>> *ibdev,
>> +                                       uint16_t type)
>> +{
>> +     const struct uverbs_types_group *groups = ibdev->types_group;
>> +     const struct uverbs_types *types;
>> +     int ret = groups->dist(&type, groups->priv);
>> +
>> +     if (ret >= groups->num_groups)
>> +             return NULL;
>> +
>> +     types = groups->type_groups[ret];
>> +
>> +     if (type >= types->num_types)
>> +             return NULL;
>> +
>> +     return types->types[type];
>> +}
>> +
>> +static int uverbs_lock_object(struct ib_uobject *uobj,
>> +                           enum uverbs_idr_access access)
>> +{
>> +     if (access == UVERBS_IDR_ACCESS_READ)
>> +             return down_read_trylock(&uobj->usecnt) == 1 ? 0 : -EBUSY;
>> +
>> +     /* lock is either WRITE or DESTROY - should be exclusive */
>> +     return down_write_trylock(&uobj->usecnt) == 1 ? 0 : -EBUSY;
>
> This function could take the lock type directly (read or write), versus inferring it based on some other access type.
>

We can, but since we use these enums in the attribute specifications,
I thought it could be more convinient.

>> +}
>> +
>> +static struct ib_uobject *get_uobj(int id, struct ib_ucontext
>> *context)
>> +{
>> +     struct ib_uobject *uobj;
>> +
>> +     rcu_read_lock();
>> +     uobj = idr_find(&context->device->idr, id);
>> +     if (uobj && uobj->live) {
>> +             if (uobj->context != context)
>> +                     uobj = NULL;
>> +     }
>> +     rcu_read_unlock();
>> +
>> +     return uobj;
>> +}
>> +
>> +struct ib_ucontext_lock {
>> +     struct kref  ref;
>> +     /* locking the uobjects_list */
>> +     struct mutex lock;
>> +};
>> +
>> +static void init_uobjects_list_lock(struct ib_ucontext_lock *lock)
>> +{
>> +     mutex_init(&lock->lock);
>> +     kref_init(&lock->ref);
>> +}
>> +
>> +static void release_uobjects_list_lock(struct kref *ref)
>> +{
>> +     struct ib_ucontext_lock *lock = container_of(ref,
>> +                                                  struct ib_ucontext_lock,
>> +                                                  ref);
>> +
>> +     kfree(lock);
>> +}
>> +
>> +static void init_uobj(struct ib_uobject *uobj, u64 user_handle,
>> +                   struct ib_ucontext *context)
>> +{
>> +     init_rwsem(&uobj->usecnt);
>> +     uobj->user_handle = user_handle;
>> +     uobj->context     = context;
>> +     uobj->live        = 0;
>> +}
>> +
>> +static int add_uobj(struct ib_uobject *uobj)
>> +{
>> +     int ret;
>> +
>> +     idr_preload(GFP_KERNEL);
>> +     spin_lock(&uobj->context->device->idr_lock);
>> +
>> +     ret = idr_alloc(&uobj->context->device->idr, uobj, 0, 0,
>> GFP_NOWAIT);
>> +     if (ret >= 0)
>> +             uobj->id = ret;
>> +
>> +     spin_unlock(&uobj->context->device->idr_lock);
>> +     idr_preload_end();
>> +
>> +     return ret < 0 ? ret : 0;
>> +}
>> +
>> +static void remove_uobj(struct ib_uobject *uobj)
>> +{
>> +     spin_lock(&uobj->context->device->idr_lock);
>> +     idr_remove(&uobj->context->device->idr, uobj->id);
>> +     spin_unlock(&uobj->context->device->idr_lock);
>> +}
>> +
>> +static void put_uobj(struct ib_uobject *uobj)
>> +{
>> +     kfree_rcu(uobj, rcu);
>> +}
>> +
>> +static struct ib_uobject *get_uobject_from_context(struct ib_ucontext
>> *ucontext,
>> +                                                const struct
>> uverbs_type_alloc_action *type,
>> +                                                u32 idr,
>> +                                                enum uverbs_idr_access access)
>> +{
>> +     struct ib_uobject *uobj;
>> +     int ret;
>> +
>> +     rcu_read_lock();
>> +     uobj = get_uobj(idr, ucontext);
>> +     if (!uobj)
>> +             goto free;
>> +
>> +     if (uobj->type != type) {
>> +             uobj = NULL;
>> +             goto free;
>> +     }
>> +
>> +     ret = uverbs_lock_object(uobj, access);
>> +     if (ret)
>> +             uobj = ERR_PTR(ret);
>> +free:
>> +     rcu_read_unlock();
>> +     return uobj;
>> +
>> +     return NULL;
>> +}
>> +
>> +static int ib_uverbs_uobject_add(struct ib_uobject *uobject,
>> +                              const struct uverbs_type_alloc_action
>> *uobject_type)
>> +{
>> +     uobject->type = uobject_type;
>> +     return add_uobj(uobject);
>> +}
>> +
>> +struct ib_uobject *uverbs_get_type_from_idr(const struct
>> uverbs_type_alloc_action *type,
>> +                                         struct ib_ucontext *ucontext,
>> +                                         enum uverbs_idr_access access,
>> +                                         uint32_t idr)
>> +{
>> +     struct ib_uobject *uobj;
>> +     int ret;
>> +
>> +     if (access == UVERBS_IDR_ACCESS_NEW) {
>> +             uobj = kmalloc(type->obj_size, GFP_KERNEL);
>> +             if (!uobj)
>> +                     return ERR_PTR(-ENOMEM);
>> +
>> +             init_uobj(uobj, 0, ucontext);
>> +
>> +             /* lock idr */
>
> Command to lock idr, but no lock is obtained.
>

ib_uverbs_uobject_add calls add_uobj which locks the IDR.

>> +             ret = ib_uverbs_uobject_add(uobj, type);
>> +             if (ret) {
>> +                     kfree(uobj);
>> +                     return ERR_PTR(ret);
>> +             }
>> +
>> +     } else {
>> +             uobj = get_uobject_from_context(ucontext, type, idr,
>> +                                             access);
>> +
>> +             if (!uobj)
>> +                     return ERR_PTR(-ENOENT);
>> +     }
>> +
>> +     return uobj;
>> +}
>> +
>> +struct ib_uobject *uverbs_get_type_from_fd(const struct
>> uverbs_type_alloc_action *type,
>> +                                        struct ib_ucontext *ucontext,
>> +                                        enum uverbs_idr_access access,
>> +                                        int fd)
>> +{
>> +     if (access == UVERBS_IDR_ACCESS_NEW) {
>> +             int _fd;
>> +             struct ib_uobject *uobj = NULL;
>> +             struct file *filp;
>> +
>> +             _fd = get_unused_fd_flags(O_CLOEXEC);
>> +             if (_fd < 0 || WARN_ON(type->obj_size < sizeof(struct
>> ib_uobject)))
>> +                     return ERR_PTR(_fd);
>> +
>> +             uobj = kmalloc(type->obj_size, GFP_KERNEL);
>> +             init_uobj(uobj, 0, ucontext);
>> +
>> +             if (!uobj)
>> +                     return ERR_PTR(-ENOMEM);
>> +
>> +             filp = anon_inode_getfile(type->fd.name, type->fd.fops,
>> +                                       uobj + 1, type->fd.flags);
>> +             if (IS_ERR(filp)) {
>> +                     put_unused_fd(_fd);
>> +                     kfree(uobj);
>> +                     return (void *)filp;
>> +             }
>> +
>> +             uobj->type = type;
>> +             uobj->id = _fd;
>> +             uobj->object = filp;
>> +
>> +             return uobj;
>> +     } else if (access == UVERBS_IDR_ACCESS_READ) {
>> +             struct file *f = fget(fd);
>> +             struct ib_uobject *uobject;
>> +
>> +             if (!f)
>> +                     return ERR_PTR(-EBADF);
>> +
>> +             uobject = f->private_data - sizeof(struct ib_uobject);
>> +             if (f->f_op != type->fd.fops ||
>> +                 !uobject->live) {
>> +                     fput(f);
>> +                     return ERR_PTR(-EBADF);
>> +             }
>> +
>> +             /*
>> +              * No need to protect it with a ref count, as fget
>> increases
>> +              * f_count.
>> +              */
>> +             return uobject;
>> +     } else {
>> +             return ERR_PTR(-EOPNOTSUPP);
>> +     }
>> +}
>> +
>> +static void ib_uverbs_uobject_enable(struct ib_uobject *uobject)
>> +{
>> +     mutex_lock(&uobject->context->uobjects_lock->lock);
>> +     list_add(&uobject->list, &uobject->context->uobjects);
>> +     mutex_unlock(&uobject->context->uobjects_lock->lock);
>
> Why not just insert the object into the list on creation?
>
>> +     uobject->live = 1;
>
> See my comments above on removing the live field.
>

Seems that the list could suffice, but I'll look into that.

>> +}
>> +
>> +static void ib_uverbs_uobject_remove(struct ib_uobject *uobject, bool
>> lock)
>> +{
>> +     /*
>> +      * Calling remove requires exclusive access, so it's not possible
>> +      * another thread will use our object.
>> +      */
>> +     uobject->live = 0;
>> +     uobject->type->free_fn(uobject->type, uobject);
>> +     if (lock)
>> +             mutex_lock(&uobject->context->uobjects_lock->lock);
>> +     list_del(&uobject->list);
>> +     if (lock)
>> +             mutex_unlock(&uobject->context->uobjects_lock->lock);
>> +     remove_uobj(uobject);
>> +     put_uobj(uobject);
>> +}
>> +
>> +static void uverbs_unlock_idr(struct ib_uobject *uobj,
>> +                           enum uverbs_idr_access access,
>> +                           bool success)
>> +{
>> +     switch (access) {
>> +     case UVERBS_IDR_ACCESS_READ:
>> +             up_read(&uobj->usecnt);
>> +             break;
>> +     case UVERBS_IDR_ACCESS_NEW:
>> +             if (success) {
>> +                     ib_uverbs_uobject_enable(uobj);
>> +             } else {
>> +                     remove_uobj(uobj);
>> +                     put_uobj(uobj);
>> +             }
>> +             break;
>> +     case UVERBS_IDR_ACCESS_WRITE:
>> +             up_write(&uobj->usecnt);
>> +             break;
>> +     case UVERBS_IDR_ACCESS_DESTROY:
>> +             if (success)
>> +                     ib_uverbs_uobject_remove(uobj, true);
>> +             else
>> +                     up_write(&uobj->usecnt);
>> +             break;
>> +     }
>> +}
>> +
>> +static void uverbs_unlock_fd(struct ib_uobject *uobj,
>> +                          enum uverbs_idr_access access,
>> +                          bool success)
>> +{
>> +     struct file *filp = uobj->object;
>> +
>> +     if (access == UVERBS_IDR_ACCESS_NEW) {
>> +             if (success) {
>> +                     kref_get(&uobj->context->ufile->ref);
>> +                     uobj->uobjects_lock = uobj->context->uobjects_lock;
>> +                     kref_get(&uobj->uobjects_lock->ref);
>> +                     ib_uverbs_uobject_enable(uobj);
>> +                     fd_install(uobj->id, uobj->object);
>
> I don't get this.  The function is unlocking something, but there are calls to get krefs?
>

Before invoking the user's callback, we're first locking all objects
and afterwards we're unlocking them. When we need to create a new
object, the lock becomes object creation and the unlock could become
(assuming the user's callback succeeded) enabling this new object.
When you add a new object (or fd in this case), we take a reference
count to both the uverbs_file and the locking context.

>> +             } else {
>> +                     fput(uobj->object);
>> +                     put_unused_fd(uobj->id);
>> +                     kfree(uobj);
>> +             }
>> +     } else {
>> +             fput(filp);
>> +     }
>> +}
>> +
>> +void uverbs_unlock_object(struct ib_uobject *uobj,
>> +                       enum uverbs_idr_access access,
>> +                       bool success)
>> +{
>> +     if (uobj->type->type == UVERBS_ATTR_TYPE_IDR)
>> +             uverbs_unlock_idr(uobj, access, success);
>> +     else if (uobj->type->type == UVERBS_ATTR_TYPE_FD)
>> +             uverbs_unlock_fd(uobj, access, success);
>> +     else
>> +             WARN_ON(true);
>> +}
>> +
>> +static void ib_uverbs_remove_fd(struct ib_uobject *uobject)
>> +{
>> +     /*
>> +      * user should release the uobject in the release
>> +      * callback.
>> +      */
>> +     if (uobject->live) {
>> +             uobject->live = 0;
>> +             list_del(&uobject->list);
>> +             uobject->type->free_fn(uobject->type, uobject);
>> +             kref_put(&uobject->context->ufile->ref,
>> ib_uverbs_release_file);
>> +             uobject->context = NULL;
>> +     }
>> +}
>> +
>> +void ib_uverbs_close_fd(struct file *f)
>> +{
>> +     struct ib_uobject *uobject = f->private_data - sizeof(struct
>> ib_uobject);
>> +
>> +     mutex_lock(&uobject->uobjects_lock->lock);
>> +     if (uobject->live) {
>> +             uobject->live = 0;
>> +             list_del(&uobject->list);
>> +             kref_put(&uobject->context->ufile->ref,
>> ib_uverbs_release_file);
>> +             uobject->context = NULL;
>> +     }
>> +     mutex_unlock(&uobject->uobjects_lock->lock);
>> +     kref_put(&uobject->uobjects_lock->ref,
>> release_uobjects_list_lock);
>> +}
>> +
>> +void ib_uverbs_cleanup_fd(void *private_data)
>> +{
>> +     struct ib_uboject *uobject = private_data - sizeof(struct
>> ib_uobject);
>> +
>> +     kfree(uobject);
>> +}
>> +
>> +void uverbs_unlock_objects(struct uverbs_attr_array *attr_array,
>> +                        size_t num,
>> +                        const struct uverbs_action_spec *spec,
>> +                        bool success)
>> +{
>> +     unsigned int i;
>> +
>> +     for (i = 0; i < num; i++) {
>> +             struct uverbs_attr_array *attr_spec_array = &attr_array[i];
>> +             const struct uverbs_attr_group_spec *group_spec =
>> +                     spec->attr_groups[i];
>> +             unsigned int j;
>> +
>> +             for (j = 0; j < attr_spec_array->num_attrs; j++) {
>> +                     struct uverbs_attr *attr = &attr_spec_array-
>> >attrs[j];
>> +                     struct uverbs_attr_spec *spec = &group_spec-
>> >attrs[j];
>> +
>> +                     if (!attr->valid)
>> +                             continue;
>> +
>> +                     if (spec->type == UVERBS_ATTR_TYPE_IDR ||
>> +                         spec->type == UVERBS_ATTR_TYPE_FD)
>> +                             /*
>> +                              * refcounts should be handled at the object
>> +                              * level and not at the uobject level.
>> +                              */
>> +                             uverbs_unlock_object(attr->obj_attr.uobject,
>> +                                                  spec->obj.access, success);
>> +             }
>> +     }
>> +}
>> +
>> +static unsigned int get_type_orders(const struct uverbs_types_group
>> *types_group)
>> +{
>> +     unsigned int i;
>> +     unsigned int max = 0;
>> +
>> +     for (i = 0; i < types_group->num_groups; i++) {
>> +             unsigned int j;
>> +             const struct uverbs_types *types = types_group-
>> >type_groups[i];
>> +
>> +             for (j = 0; j < types->num_types; j++) {
>> +                     if (!types->types[j] || !types->types[j]->alloc)
>> +                             continue;
>> +                     if (types->types[j]->alloc->order > max)
>> +                             max = types->types[j]->alloc->order;
>> +             }
>> +     }
>> +
>> +     return max;
>> +}
>> +
>> +void ib_uverbs_uobject_type_cleanup_ucontext(struct ib_ucontext
>> *ucontext,
>> +                                          const struct uverbs_types_group
>> *types_group)
>> +{
>> +     unsigned int num_orders = get_type_orders(types_group);
>> +     unsigned int i;
>> +
>> +     for (i = 0; i <= num_orders; i++) {
>> +             struct ib_uobject *obj, *next_obj;
>> +
>> +             /*
>> +              * No need to take lock here, as cleanup should be called
>> +              * after all commands finished executing. Newly executed
>> +              * commands should fail.
>> +              */
>> +             mutex_lock(&ucontext->uobjects_lock->lock);
>
> It's really confusing to see a comment about 'no need to take lock' immediately followed by a call to lock.
>

Yeah :) That was before adding the fd. I'll delete the comment.

>> +             list_for_each_entry_safe(obj, next_obj, &ucontext-
>> >uobjects,
>> +                                      list)
>> +                     if (obj->type->order == i) {
>> +                             if (obj->type->type == UVERBS_ATTR_TYPE_IDR)
>> +                                     ib_uverbs_uobject_remove(obj, false);
>> +                             else
>> +                                     ib_uverbs_remove_fd(obj);
>> +                     }
>> +             mutex_unlock(&ucontext->uobjects_lock->lock);
>> +     }
>> +     kref_put(&ucontext->uobjects_lock->ref,
>> release_uobjects_list_lock);
>> +}
>> +
>> +int ib_uverbs_uobject_type_initialize_ucontext(struct ib_ucontext
>> *ucontext)
>
> Please work on the function names.  This is horrendously long and still doesn't help describe what it does.
>

This just initialized the types part of the ucontext. Any suggestions?

>> +{
>> +     ucontext->uobjects_lock = kmalloc(sizeof(*ucontext-
>> >uobjects_lock),
>> +                                       GFP_KERNEL);
>> +     if (!ucontext->uobjects_lock)
>> +             return -ENOMEM;
>> +
>> +     init_uobjects_list_lock(ucontext->uobjects_lock);
>> +     INIT_LIST_HEAD(&ucontext->uobjects);
>> +
>> +     return 0;
>> +}
>> +
>> +void ib_uverbs_uobject_type_release_ucontext(struct ib_ucontext
>> *ucontext)
>> +{
>> +     kfree(ucontext->uobjects_lock);
>> +}
>
> No need to wrap a call to 'free'.
>

In order to abstract away the ucontext type data structure.

>> +
>> diff --git a/drivers/infiniband/core/rdma_core.h
>> b/drivers/infiniband/core/rdma_core.h
>> new file mode 100644
>> index 0000000..8990115
>> --- /dev/null
>> +++ b/drivers/infiniband/core/rdma_core.h
>> @@ -0,0 +1,75 @@
>> +/*
>> + * Copyright (c) 2005 Topspin Communications.  All rights reserved.
>> + * Copyright (c) 2005, 2006 Cisco Systems.  All rights reserved.
>> + * Copyright (c) 2005-2016 Mellanox Technologies. All rights reserved.
>> + * Copyright (c) 2005 Voltaire, Inc. All rights reserved.
>> + * Copyright (c) 2005 PathScale, Inc. All rights reserved.
>> + *
>> + * This software is available to you under a choice of one of two
>> + * licenses.  You may choose to be licensed under the terms of the GNU
>> + * General Public License (GPL) Version 2, available from the file
>> + * COPYING in the main directory of this source tree, or the
>> + * OpenIB.org BSD license below:
>> + *
>> + *     Redistribution and use in source and binary forms, with or
>> + *     without modification, are permitted provided that the following
>> + *     conditions are met:
>> + *
>> + *      - Redistributions of source code must retain the above
>> + *        copyright notice, this list of conditions and the following
>> + *        disclaimer.
>> + *
>> + *      - Redistributions in binary form must reproduce the above
>> + *        copyright notice, this list of conditions and the following
>> + *        disclaimer in the documentation and/or other materials
>> + *        provided with the distribution.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
>> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
>> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
>> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
>> + * SOFTWARE.
>> + */
>> +
>> +#ifndef UOBJECT_H
>> +#define UOBJECT_H
>> +
>> +#include <linux/idr.h>
>> +#include <rdma/uverbs_ioctl.h>
>> +#include <rdma/ib_verbs.h>
>> +#include <linux/mutex.h>
>> +
>> +const struct uverbs_type *uverbs_get_type(const struct ib_device
>> *ibdev,
>> +                                       uint16_t type);
>> +struct ib_uobject *uverbs_get_type_from_idr(const struct
>> uverbs_type_alloc_action *type,
>> +                                         struct ib_ucontext *ucontext,
>> +                                         enum uverbs_idr_access access,
>> +                                         uint32_t idr);
>> +struct ib_uobject *uverbs_get_type_from_fd(const struct
>> uverbs_type_alloc_action *type,
>> +                                        struct ib_ucontext *ucontext,
>> +                                        enum uverbs_idr_access access,
>> +                                        int fd);
>> +void uverbs_unlock_object(struct ib_uobject *uobj,
>> +                       enum uverbs_idr_access access,
>> +                       bool success);
>> +void uverbs_unlock_objects(struct uverbs_attr_array *attr_array,
>> +                        size_t num,
>> +                        const struct uverbs_action_spec *spec,
>> +                        bool success);
>> +
>> +void ib_uverbs_uobject_type_cleanup_ucontext(struct ib_ucontext
>> *ucontext,
>> +                                          const struct uverbs_types_group
>> *types_group);
>> +int ib_uverbs_uobject_type_initialize_ucontext(struct ib_ucontext
>> *ucontext);
>> +void ib_uverbs_uobject_type_release_ucontext(struct ib_ucontext
>> *ucontext);
>> +void ib_uverbs_close_fd(struct file *f);
>> +void ib_uverbs_cleanup_fd(void *private_data);
>> +
>> +static inline void *uverbs_fd_to_priv(struct ib_uobject *uobj)
>> +{
>> +     return uobj + 1;
>> +}
>
> This seems like a rather useless function.
>

Why? The user sholdn't know or care how we put our structs together.

>> +
>> +#endif /* UIDR_H */
>> diff --git a/drivers/infiniband/core/uverbs.h
>> b/drivers/infiniband/core/uverbs.h
>> index 8074705..ae7d4b8 100644
>> --- a/drivers/infiniband/core/uverbs.h
>> +++ b/drivers/infiniband/core/uverbs.h
>> @@ -180,6 +180,7 @@ void idr_remove_uobj(struct ib_uobject *uobj);
>>  struct file *ib_uverbs_alloc_event_file(struct ib_uverbs_file
>> *uverbs_file,
>>                                       struct ib_device *ib_dev,
>>                                       int is_async);
>> +void ib_uverbs_release_file(struct kref *ref);
>>  void ib_uverbs_free_async_event_file(struct ib_uverbs_file
>> *uverbs_file);
>>  struct ib_uverbs_event_file *ib_uverbs_lookup_comp_file(int fd);
>>
>> diff --git a/drivers/infiniband/core/uverbs_main.c
>> b/drivers/infiniband/core/uverbs_main.c
>> index f783723..e63357a 100644
>> --- a/drivers/infiniband/core/uverbs_main.c
>> +++ b/drivers/infiniband/core/uverbs_main.c
>> @@ -341,7 +341,7 @@ static void ib_uverbs_comp_dev(struct
>> ib_uverbs_device *dev)
>>       complete(&dev->comp);
>>  }
>>
>> -static void ib_uverbs_release_file(struct kref *ref)
>> +void ib_uverbs_release_file(struct kref *ref)
>>  {
>>       struct ib_uverbs_file *file =
>>               container_of(ref, struct ib_uverbs_file, ref);
>> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
>> index b5d2075..7240615 100644
>> --- a/include/rdma/ib_verbs.h
>> +++ b/include/rdma/ib_verbs.h
>> @@ -1329,8 +1329,11 @@ struct ib_fmr_attr {
>>
>>  struct ib_umem;
>>
>> +struct ib_ucontext_lock;
>> +
>>  struct ib_ucontext {
>>       struct ib_device       *device;
>> +     struct ib_uverbs_file  *ufile;
>>       struct list_head        pd_list;
>>       struct list_head        mr_list;
>>       struct list_head        mw_list;
>> @@ -1344,6 +1347,10 @@ struct ib_ucontext {
>>       struct list_head        rwq_ind_tbl_list;
>>       int                     closing;
>>
>> +     /* lock for uobjects list */
>> +     struct ib_ucontext_lock *uobjects_lock;
>> +     struct list_head        uobjects;
>> +
>>       struct pid             *tgid;
>>  #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
>>       struct rb_root      umem_tree;
>> @@ -1363,16 +1370,28 @@ struct ib_ucontext {
>>  #endif
>>  };
>>
>> +struct uverbs_object_list;
>> +
>> +#define OLD_ABI_COMPAT
>> +
>>  struct ib_uobject {
>>       u64                     user_handle;    /* handle given to us by userspace
>> */
>>       struct ib_ucontext     *context;        /* associated user context
>> */
>>       void                   *object;         /* containing object */
>>       struct list_head        list;           /* link to context's list */
>> -     int                     id;             /* index into kernel idr */
>> -     struct kref             ref;
>> -     struct rw_semaphore     mutex;          /* protects .live */
>> +     int                     id;             /* index into kernel idr/fd */
>> +#ifdef OLD_ABI_COMPAT
>> +     struct kref             ref;
>> +#endif
>> +     struct rw_semaphore     usecnt;         /* protects exclusive
>> access */
>> +#ifdef OLD_ABI_COMPAT
>> +     struct rw_semaphore     mutex;          /* protects .live */
>> +#endif
>>       struct rcu_head         rcu;            /* kfree_rcu() overhead */
>>       int                     live;
>> +
>> +     const struct uverbs_type_alloc_action *type;
>> +     struct ib_ucontext_lock *uobjects_lock;
>>  };
>>
>>  struct ib_udata {
>> @@ -2101,6 +2120,9 @@ struct ib_device {
>>        */
>>       int (*get_port_immutable)(struct ib_device *, u8, struct
>> ib_port_immutable *);
>>       void (*get_dev_fw_str)(struct ib_device *, char *str, size_t
>> str_len);
>> +     struct list_head type_list;
>> +
>> +     const struct uverbs_types_group *types_group;
>>  };
>>
>>  struct ib_client {
>> diff --git a/include/rdma/uverbs_ioctl.h b/include/rdma/uverbs_ioctl.h
>> new file mode 100644
>> index 0000000..2f50045
>> --- /dev/null
>> +++ b/include/rdma/uverbs_ioctl.h
>> @@ -0,0 +1,195 @@
>> +/*
>> + * Copyright (c) 2016, Mellanox Technologies inc.  All rights
>> reserved.
>> + *
>> + * This software is available to you under a choice of one of two
>> + * licenses.  You may choose to be licensed under the terms of the GNU
>> + * General Public License (GPL) Version 2, available from the file
>> + * COPYING in the main directory of this source tree, or the
>> + * OpenIB.org BSD license below:
>> + *
>> + *     Redistribution and use in source and binary forms, with or
>> + *     without modification, are permitted provided that the following
>> + *     conditions are met:
>> + *
>> + *      - Redistributions of source code must retain the above
>> + *        copyright notice, this list of conditions and the following
>> + *        disclaimer.
>> + *
>> + *      - Redistributions in binary form must reproduce the above
>> + *        copyright notice, this list of conditions and the following
>> + *        disclaimer in the documentation and/or other materials
>> + *        provided with the distribution.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
>> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
>> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
>> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
>> + * SOFTWARE.
>> + */
>> +
>> +#ifndef _UVERBS_IOCTL_
>> +#define _UVERBS_IOCTL_
>> +
>> +#include <linux/kernel.h>
>> +
>> +struct uverbs_object_type;
>> +struct ib_ucontext;
>> +struct ib_uobject;
>> +struct ib_device;
>> +struct uverbs_uobject_type;
>> +
>> +/*
>> + * =======================================
>> + *   Verbs action specifications
>> + * =======================================
>> + */
>
> I intentionally used urdma (though condensed to 3 letters that I don't recall atm), rather than uverbs.  This will need to work with non-verbs devices and interfaces -- again, consider how this fits with the rdma cm.  Verbs has a very specific meaning, which gets lost if we start referring to everything as 'verbs'.  It's bad enough that we're stuck with 'drivers/infiniband' and 'rdma', such that 'infiniband' also means ethernet and rdma means nothing.
>

IMHO - let's agree on the concept of this infrastructure. One we
decide its scope, we could generalize it (i.e - ioctl_provider and
ioctl_context) and implement it to rdma-cm as well.

>> +
>> +enum uverbs_attr_type {
>> +     UVERBS_ATTR_TYPE_PTR_IN,
>> +     UVERBS_ATTR_TYPE_PTR_OUT,
>> +     UVERBS_ATTR_TYPE_IDR,
>> +     UVERBS_ATTR_TYPE_FD,
>> +};
>> +
>> +enum uverbs_idr_access {
>> +     UVERBS_IDR_ACCESS_READ,
>> +     UVERBS_IDR_ACCESS_WRITE,
>> +     UVERBS_IDR_ACCESS_NEW,
>> +     UVERBS_IDR_ACCESS_DESTROY
>> +};
>> +
>> +struct uverbs_attr_spec {
>> +     u16                             len;
>> +     enum uverbs_attr_type           type;
>> +     struct {
>> +             u16                     obj_type;
>> +             u8                      access;
>
> Is access intended to be an enum uverbs_idr_access value?
>

Yeah, worth using this enum. Thanks.

>> +     } obj;
>
> I would remove (flatten) the substructure and re-order the fields for better alignment.
>

I noticed there are several places which aren't aliged. It's in my todo list.

>> +};
>> +
>> +struct uverbs_attr_group_spec {
>> +     struct uverbs_attr_spec         *attrs;
>> +     size_t                          num_attrs;
>> +};
>> +
>> +struct uverbs_action_spec {
>> +     const struct uverbs_attr_group_spec             **attr_groups;
>> +     /* if > 0 -> validator, otherwise, error */
>
> ? not sure what this comment means
>
>> +     int (*dist)(__u16 *attr_id, void *priv);
>
> What does 'dist' stand for?
>

dist = distribution function.
It maps the attributes you got from the user-space to your groups. You
could think of each group as a namespace - where its attributes (or
types/actions) starts from zero in the sake of compactness.
So, for example, it gets an attribute 0x8010 and maps to to "group 1"
(provider) and attribute 0x10.

>> +     void                                            *priv;
>> +     size_t                                          num_groups;
>> +};
>> +
>> +struct uverbs_attr_array;
>> +struct ib_uverbs_file;
>> +
>> +struct uverbs_action {
>> +     struct uverbs_action_spec spec;
>> +     void *priv;
>> +     int (*handler)(struct ib_device *ib_dev, struct ib_uverbs_file
>> *ufile,
>> +                    struct uverbs_attr_array *ctx, size_t num, void
>> *priv);
>> +};
>> +
>> +struct uverbs_type_alloc_action;
>> +typedef void (*free_type)(const struct uverbs_type_alloc_action
>> *uobject_type,
>> +                       struct ib_uobject *uobject);
>> +
>> +struct uverbs_type_alloc_action {
>> +     enum uverbs_attr_type           type;
>> +     int                             order;
>
> I think this is being used as destroy order, in which case I would rename it to clarify the intent.  Though I'd prefer we come up with a more efficient destruction mechanism than the repeated nested looping.
>

In one of the earlier revisions I used a sorted list, which was
efficient. I recall that Jason didn't like its complexity and
re-thinking about that - he's right. Most of your types are "order
number" 0 anyway. So you'll probably iterate very few objects in the
next round (in verbs, everything but MRs and PDs).

>> +     size_t                          obj_size;
>
> This can be alloc_fn
>
>> +     free_type                       free_fn;
>> +     struct {
>> +             const struct file_operations    *fops;
>> +             const char                      *name;
>> +             int                             flags;
>> +     } fd;
>> +};
>> +
>> +struct uverbs_type_actions_group {
>> +     size_t                                  num_actions;
>> +     const struct uverbs_action              **actions;
>> +};
>> +
>> +struct uverbs_type {
>> +     size_t                                  num_groups;
>> +     const struct uverbs_type_actions_group  **action_groups;
>> +     const struct uverbs_type_alloc_action   *alloc;
>> +     int (*dist)(__u16 *action_id, void *priv);
>> +     void                                    *priv;
>> +};
>> +
>> +struct uverbs_types {
>> +     size_t                                  num_types;
>> +     const struct uverbs_type                **types;
>> +};
>> +
>> +struct uverbs_types_group {
>> +     const struct uverbs_types               **type_groups;
>> +     size_t                                  num_groups;
>> +     int (*dist)(__u16 *type_id, void *priv);
>> +     void                                    *priv;
>> +};
>> +
>> +/* =================================================
>> + *              Parsing infrastructure
>> + * =================================================
>> + */
>> +
>> +struct uverbs_ptr_attr {
>> +     void    * __user ptr;
>> +     __u16           len;
>> +};
>> +
>> +struct uverbs_fd_attr {
>> +     int             fd;
>> +};
>> +
>> +struct uverbs_uobj_attr {
>> +     /*  idr handle */
>> +     __u32   idr;
>> +};
>> +
>> +struct uverbs_obj_attr {
>> +     /* pointer to the kernel descriptor -> type, access, etc */
>> +     const struct uverbs_attr_spec *val;
>> +     struct ib_uverbs_attr __user    *uattr;
>> +     const struct uverbs_type_alloc_action   *type;
>> +     struct ib_uobject               *uobject;
>> +     union {
>> +             struct uverbs_fd_attr           fd;
>> +             struct uverbs_uobj_attr         uobj;
>> +     };
>> +};
>> +
>> +struct uverbs_attr {
>> +     bool valid;
>> +     union {
>> +             struct uverbs_ptr_attr  cmd_attr;
>> +             struct uverbs_obj_attr  obj_attr;
>> +     };
>> +};
>
> It's odd to have a union that's part of a structure without some field to indicate which union field is accessible.
>

You index this array but the attribute id from the user's callback
funciton. The user should know what's the type of the attribute, as
[s]he declared the specification.

>> +
>> +/* output of one validator */
>> +struct uverbs_attr_array {
>> +     size_t num_attrs;
>> +     /* arrays of attrubytes, index is the id i.e SEND_CQ */
>> +     struct uverbs_attr *attrs;
>> +};
>> +
>> +/* =================================================
>> + *              Types infrastructure
>> + * =================================================
>> + */
>> +
>> +int ib_uverbs_uobject_type_add(struct list_head      *head,
>> +                            void (*free)(struct uverbs_uobject_type *type,
>> +                                         struct ib_uobject *uobject,
>> +                                         struct ib_ucontext *ucontext),
>> +                            uint16_t obj_type);
>> +void ib_uverbs_uobject_types_remove(struct ib_device *ib_dev);
>> +
>> +#endif
>> --
>> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thanks for taking a look.

Regards,
Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 perftest] Support for Chelsio T6 devices
From: Leon Romanovsky @ 2016-10-31 23:11 UTC (permalink / raw)
  To: Steve Wise
  Cc: 'Gil Rockah', 'Zohar Ben Aharon',
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <010701d2305d$27a9fff0$76fdffd0$@opengridcomputing.com>

[-- Attachment #1: Type: text/plain, Size: 787 bytes --]

On Thu, Oct 27, 2016 at 09:19:40AM -0500, Steve Wise wrote:
> > > Hey guys,
> > >
> > > Has this patch been integrated yet?  Also, where is the official upstream
> > > perftest git repo now?
> >
> > Hi Steve,
> >
> > Sorry for the late response, due to the holidays our responses are
> > delaying a little bit.
> >
> > We moved perftest repo to be under github's linux-rdma organization [1]
> > and it is now [2].
> >
> > I'll remind to Zohar to take it.
> >
> > [1] https://github.com/linux-rdma/
> > [2] https://github.com/linux-rdma/perftest
>
>
> Thanks for the update!

Hi Steve,
I merged both of your patches
https://patchwork.kernel.org/patch/9341763/
https://patchwork.kernel.org/patch/9219151/
https://github.com/linux-rdma/perftest/pull/1

Sorry for the delay.

>
> Steve.
>
>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* [PATCH] net/mlx5: Simplify a test
From: Christophe JAILLET @ 2016-11-01  7:10 UTC (permalink / raw)
  To: matanb-VPRAkNaXOzVWk0Htik3J/w, leonro-VPRAkNaXOzVWk0Htik3J/w
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-janitors-u79uwXL29TY76Z2rM5mHXA, Christophe JAILLET

'create_root_ns()' does not return an error pointer, so the test can be
simplified to be more consistent.

Signed-off-by: Christophe JAILLET <christophe.jaillet-39ZsbGIQGT5GWvitb5QawA@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 904853f9cf7a..330955f6badc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1833,7 +1833,7 @@ static int init_root_ns(struct mlx5_flow_steering *steering)
 {
 
 	steering->root_ns = create_root_ns(steering, FS_FT_NIC_RX);
-	if (IS_ERR_OR_NULL(steering->root_ns))
+	if (!steering->root_ns)
 		goto cleanup;
 
 	if (init_root_tree(steering, &root_fs, &steering->root_ns->ns.node))
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH] net/mlx5: Simplify a test
From: Matan Barak @ 2016-11-01  9:38 UTC (permalink / raw)
  To: Christophe JAILLET, leonro
  Cc: netdev, linux-rdma, linux-kernel, kernel-janitors
In-Reply-To: <20161101071053.12486-1-christophe.jaillet@wanadoo.fr>

On 01/11/2016 09:10, Christophe JAILLET wrote:
> 'create_root_ns()' does not return an error pointer, so the test can be
> simplified to be more consistent.
>
> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
> index 904853f9cf7a..330955f6badc 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
> @@ -1833,7 +1833,7 @@ static int init_root_ns(struct mlx5_flow_steering *steering)
>  {
>
>  	steering->root_ns = create_root_ns(steering, FS_FT_NIC_RX);
> -	if (IS_ERR_OR_NULL(steering->root_ns))
> +	if (!steering->root_ns)
>  		goto cleanup;
>
>  	if (init_root_tree(steering, &root_fs, &steering->root_ns->ns.node))
>

Thanks.
Acked-by: Matan Barak <matanb@mellanox.com>

^ permalink raw reply

* Re: [PATCH] net/mlx5: Simplify a test
From: Saeed Mahameed @ 2016-11-01  9:59 UTC (permalink / raw)
  To: Christophe JAILLET, matanb, leonro
  Cc: netdev, linux-rdma, linux-kernel, kernel-janitors
In-Reply-To: <20161101071053.12486-1-christophe.jaillet@wanadoo.fr>



On 11/01/2016 09:10 AM, Christophe JAILLET wrote:
> 'create_root_ns()' does not return an error pointer, so the test can be
> simplified to be more consistent.
>
> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>

Acked-by: Saeed Mahameed <saeedm@mellanox.com>

^ permalink raw reply

* Re: [PATCH rdma-next 2/4] IB/core: Support rate limit for packet pacing
From: Yuval Shaia @ 2016-11-01 10:06 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Bodong Wang
In-Reply-To: <1477909297-14491-3-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Two (extremely) minor suggestions inline.

Yuval

On Mon, Oct 31, 2016 at 12:21:35PM +0200, Leon Romanovsky wrote:
> From: Bodong Wang <bodong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> Add new member rate_limit to ib_qp_attr, it shows the packet pacing rate

Suggesting to replace with:
Add new member rate_limit to ib_qp_attr which holds the packet pacing rate

> in Kbps, 0 means unlimited.
> 
> IB_QP_RATE_LIMIT is added to ib_attr_mask, and it could be used by RAW

Suggesting to replace with:
IB_QP_RATE_LIMIT is added to ib_attr_mask and could be used by RAW

> QPs when changing QP state from RTR to RTS, RTS to RTS.
> 
> Signed-off-by: Bodong Wang <bodong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Reviewed-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> ---
>  drivers/infiniband/core/verbs.c | 2 ++
>  include/rdma/ib_verbs.h         | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
> index 8368764..3e688b3 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -1014,6 +1014,7 @@ static const struct {
>  						 IB_QP_QKEY),
>  				 [IB_QPT_GSI] = (IB_QP_CUR_STATE		|
>  						 IB_QP_QKEY),
> +				 [IB_QPT_RAW_PACKET] = IB_QP_RATE_LIMIT,
>  			 }
>  		}
>  	},
> @@ -1047,6 +1048,7 @@ static const struct {
>  						IB_QP_QKEY),
>  				[IB_QPT_GSI] = (IB_QP_CUR_STATE			|
>  						IB_QP_QKEY),
> +				[IB_QPT_RAW_PACKET] = IB_QP_RATE_LIMIT,
>  			}
>  		},
>  		[IB_QPS_SQD]   = {
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 5ad43a4..a065361 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -1102,6 +1102,7 @@ enum ib_qp_attr_mask {
>  	IB_QP_RESERVED2			= (1<<22),
>  	IB_QP_RESERVED3			= (1<<23),
>  	IB_QP_RESERVED4			= (1<<24),
> +	IB_QP_RATE_LIMIT		= (1<<25),
>  };
>  
>  enum ib_qp_state {
> @@ -1151,6 +1152,7 @@ struct ib_qp_attr {
>  	u8			rnr_retry;
>  	u8			alt_port_num;
>  	u8			alt_timeout;
> +	u32			rate_limit;
>  };
>  
>  enum ib_wr_opcode {
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH -next] qedr: Use list_move_tail instead of list_del/list_add_tail
From: Amrani, Ram @ 2016-11-01 10:28 UTC (permalink / raw)
  To: Wei Yongjun, Doug Ledford, Sean Hefty, Hal Rosenstock,
	Borundia, Rajesh
  Cc: Wei Yongjun, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1477757993-32186-1-git-send-email-weiyj.lk-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

> Using list_move_tail() instead of list_del() + list_add_tail().
> 
> Signed-off-by: Wei Yongjun <weiyongjun1-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> ---
>  drivers/infiniband/hw/qedr/verbs.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/qedr/verbs.c
> b/drivers/infiniband/hw/qedr/verbs.c
> index a615142..cdaddf9 100644
> --- a/drivers/infiniband/hw/qedr/verbs.c
> +++ b/drivers/infiniband/hw/qedr/verbs.c
> @@ -2413,8 +2413,7 @@ static void handle_completed_mrs(struct qedr_dev
> *dev, struct mr_info *info)
>  		 */
>  		pbl = list_first_entry(&info->inuse_pbl_list,
>  				       struct qedr_pbl, list_entry);
> -		list_del(&pbl->list_entry);
> -		list_add_tail(&pbl->list_entry, &info->free_pbl_list);
> +		list_move_tail(&pbl->list_entry, &info->free_pbl_list);
>  		info->completed_handled++;
>  	}
>  }

Thanks

Acked-by: Ram Amrani <Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH] qedr: Fix possible memory leak in qedr_create_qp()
From: Amrani, Ram @ 2016-11-01 10:38 UTC (permalink / raw)
  To: Wei Yongjun, Doug Ledford, Sean Hefty, Hal Rosenstock,
	Borundia, Rajesh
  Cc: Wei Yongjun, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1477672427-31575-1-git-send-email-weiyj.lk-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

> 'qp' is malloced in qedr_create_qp() and should be freed before leaving from the
> error handling cases, otherwise it will cause memory leak.
> 
> Signed-off-by: Wei Yongjun <weiyongjun1-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> ---
>  drivers/infiniband/hw/qedr/verbs.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/qedr/verbs.c
> b/drivers/infiniband/hw/qedr/verbs.c
> index a615142..b60f145 100644
> --- a/drivers/infiniband/hw/qedr/verbs.c
> +++ b/drivers/infiniband/hw/qedr/verbs.c
> @@ -1477,6 +1477,7 @@ struct ib_qp *qedr_create_qp(struct ib_pd *ibpd,
>  	struct qedr_ucontext *ctx = NULL;
>  	struct qedr_create_qp_ureq ureq;
>  	struct qedr_qp *qp;
> +	struct ib_qp *ibqp;
>  	int rc = 0;
> 
>  	DP_DEBUG(dev, QEDR_MSG_QP, "create qp: called from %s, pd=%p\n",
> @@ -1486,13 +1487,13 @@ struct ib_qp *qedr_create_qp(struct ib_pd *ibpd,
>  	if (rc)
>  		return ERR_PTR(rc);
> 
> +	if (attrs->srq)
> +		return ERR_PTR(-EINVAL);
> +
>  	qp = kzalloc(sizeof(*qp), GFP_KERNEL);
>  	if (!qp)
>  		return ERR_PTR(-ENOMEM);
> 
> -	if (attrs->srq)
> -		return ERR_PTR(-EINVAL);
> -
>  	DP_DEBUG(dev, QEDR_MSG_QP,
>  		 "create qp: sq_cq=%p, sq_icid=%d, rq_cq=%p, rq_icid=%d\n",
>  		 get_qedr_cq(attrs->send_cq),
> @@ -1508,7 +1509,10 @@ struct ib_qp *qedr_create_qp(struct ib_pd *ibpd,
>  			       "create qp: unexpected udata when creating GSI
> QP\n");
>  			goto err0;
>  		}
> -		return qedr_create_gsi_qp(dev, attrs, qp);
> +		ibqp = qedr_create_gsi_qp(dev, attrs, qp);
> +		if (IS_ERR(ibqp))
> +			kfree(qp);
> +		return ibqp;
>  	}
> 
>  	memset(&in_params, 0, sizeof(in_params));

Thanks again

Acked-by: Ram Amrani <Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCHv12 0/3] rdmacg: IB/core: rdma controller support
From: Parav Pandit @ 2016-11-01 11:03 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-rdma, Li Zefan,
	Johannes Weiner, Doug Ledford, Christoph Hellwig, Liran Liss,
	Hefty, Sean, Jason Gunthorpe, Haggai Eran,
	james.l.morris-QHcLZuEGTsvQT0dZR+AlfA, Or Gerlitz, Matan Barak
In-Reply-To: <20161031065441.GY3617-2ukJVAZIZ/Y@public.gmane.org>

On Mon, Oct 31, 2016 at 12:24 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> On Thu, Oct 20, 2016 at 01:48:27AM +0530, Parav Pandit wrote:
>> On Thu, Oct 20, 2016 at 1:35 AM, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>> > Hello, Parav.
>> >
>> > On Thu, Oct 20, 2016 at 01:24:42AM +0530, Parav Pandit wrote:
>> >> userland can get the max numbers using other framework which is used
>> >> by control & data plane available in C library form or in form of
>> >> system tools.
>> >> I was preferring to get and set through same interface because,
>> >> It simplifies user land software which is often not written in C so
>> >> its likely that it needs to rely on system tools and parse the
>> >> content, iterate through devices etc.
>> >> Getting these info through rdma.max just makes it simple. There will
>> >> be logic built to read/write rdma.max in userland anyway, which can be
>> >> leveraged for percentage calculation instead of doing it from two
>> >> places.
>> >
>> > Yeah, I get that this can be convenient in this case but it isn't a
>> > generic approach.  I'd much prefer keeping it in line with other
>> > resources.
>> >
>> Hmm. we don't have /proc/sys/kernel/pid_max type of simple interface
>> to get the max values for rdma resources.
>> rdma.max is close to that simplicity.
>
> Sorry for my late response (very long weekends and piles of mails after it) and
> for not clarifying our requirements better, which are very simple.
>
> 1. We will have vendor specific vendors objects in the future (new ABI
> support it and designed for that).
I will let others comments on it. The patch_v11 design was allowing
vendor specific objects and standard objects to be defined in IB core
and rdma cgroup was facilitator to do so. We didn't reach consensus on
that approach.

> 2. We don't want to fight for every addition of such objects to cgroup list.
Ditto comment as above.

> 3. We don't want to teach and/or rewrite scripts for "average" user after
> addition of new objects.
This we can possibly do by having new rdma.percentage knob, which gets
configured by default for every new object in rdma cgroup.
This way average user/administrator doesn't have to know about it.

> 4. Cgroup configuration should be as close as possible to "standard" if
> such exists, so all infinite internet guides will work for RDMA too.
I didnt follow this comment. Can you please explain? Are you saying
rdma cgroup should have define all the objects of IB spec?
>
> From my understanding of current status.
> My naive approach of introducing GLOBAL_HCA object is the way to go and the real question
> is to understand how to configure it, am I right?
>
Global object won't work for below reason.
Lets take example that makes life easier.
Lets say two new RDMA objects exist which are not part of rdma cgroup
standard resource definition.
say, indirection table and PSM tags.
Both are abstracted using one global_hca resource object.
Say its given 10%.
Now IB core performs charging of each such object using GLOBAL_HCA.
(Because cgroup level there is only one object GLOBAL_HCA).
So two or more resources are mapped to single object.
Which means, one object can be charged more with total limit still
under 10%, thats leads to same problem as not having cgroup at all.

So my opinion is:
(a) Let cgroup define the current standard objects and new reasonable
set of vendor specific objects in future.
(b) Add new rdma.percentage parameter so that any new standard object
or vendor specific object can be abstracted from average end user and
applications which are yet to catch up.
I believe this takes care of your point (1), (3), (4)?

In other hypothetical design,
we can have rdma group as just pid to cgroup mapping facilitator.
All the charging/uncharging logic moves to IB core in form of library,
that standard ABI uverbs and vendor specific layer invokes. In this
approach there will be code duplicated in every such vendor driver.
By doing so, more callbacks will also have to be moved down till IB
core and vendor drivers for cgroup creation/deletion/offline etc.

This also means that lack of standard object definitions, may creates
more confusion to end user and orchestration applications. I prefer to
avoid such design.

Parav

^ permalink raw reply

* [PATCH rdma-rc] IB/ipoib: print only once when doesn't support IB_QP_CREATE_USE_GFP_NOIO
From: Leon Romanovsky @ 2016-11-01 12:34 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Erez Shitrit

From: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Currently when the card doesn't support IB_QP_CREATE_USE_GFP_NOIO it warns
on every QP creation, It becomes worse when driver works in connected mode
we will see one print on each new connection, instead do it once.

Fixes: 09b93088d7 ('Add a QP creation flag to use GFP_NOIO allocations')
Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/ulp/ipoib/ipoib_cm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 4ad297d..917393b 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -1053,8 +1053,8 @@ static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ipoib_
 
 	tx_qp = ib_create_qp(priv->pd, &attr);
 	if (PTR_ERR(tx_qp) == -EINVAL) {
-		ipoib_warn(priv, "can't use GFP_NOIO for QPs on device %s, using GFP_KERNEL\n",
-			   priv->ca->name);
+		pr_warn_once("can't use GFP_NOIO for QPs on device %s, using GFP_KERNEL\n",
+			     priv->ca->name);
 		attr.create_flags &= ~IB_QP_CREATE_USE_GFP_NOIO;
 		tx_qp = ib_create_qp(priv->pd, &attr);
 	}
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* RE: [PATCH rdma-core] qede: fix general protection fault may occur on probe
From: Amrani, Ram @ 2016-11-01 13:34 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Elior, Ariel,
	Kalderon, Michal, Mintz, Yuval,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20161030205026.GR3617-2ukJVAZIZ/Y@public.gmane.org>

> We use "rdma-core" notations for patches intended to consolidated library,
> while your patch is for the kernel.

ACK. My bad. Shall I re-send it?

Ram

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-rc] IB/ipoib: print only once when doesn't support IB_QP_CREATE_USE_GFP_NOIO
From: Yuval Shaia @ 2016-11-01 13:48 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Erez Shitrit
In-Reply-To: <1478003653-16248-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

On Tue, Nov 01, 2016 at 02:34:13PM +0200, Leon Romanovsky wrote:
> From: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> Currently when the card doesn't support IB_QP_CREATE_USE_GFP_NOIO it warns
> on every QP creation, It becomes worse when driver works in connected mode
> we will see one print on each new connection, instead do it once.
> 
> Fixes: 09b93088d7 ('Add a QP creation flag to use GFP_NOIO allocations')
> Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> ---
>  drivers/infiniband/ulp/ipoib/ipoib_cm.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> index 4ad297d..917393b 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> @@ -1053,8 +1053,8 @@ static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ipoib_
>  
>  	tx_qp = ib_create_qp(priv->pd, &attr);
>  	if (PTR_ERR(tx_qp) == -EINVAL) {
> -		ipoib_warn(priv, "can't use GFP_NOIO for QPs on device %s, using GFP_KERNEL\n",
> -			   priv->ca->name);
> +		pr_warn_once("can't use GFP_NOIO for QPs on device %s, using GFP_KERNEL\n",
> +			     priv->ca->name);

But it will still re-print it for different device, right?

>  		attr.create_flags &= ~IB_QP_CREATE_USE_GFP_NOIO;
>  		tx_qp = ib_create_qp(priv->pd, &attr);
>  	}
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-core] qede: fix general protection fault may occur on probe
From: Leon Romanovsky @ 2016-11-01 14:01 UTC (permalink / raw)
  To: Amrani, Ram
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Elior, Ariel,
	Kalderon, Michal, Mintz, Yuval,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <SN1PR07MB2207881FAE3C35859A8EDED5F8A10-mikhvbZlbf8TSoR2DauN2+FPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 554 bytes --]

On Tue, Nov 01, 2016 at 01:34:23PM +0000, Amrani, Ram wrote:
> > We use "rdma-core" notations for patches intended to consolidated library,
> > while your patch is for the kernel.
>
> ACK. My bad. Shall I re-send it?

No, there is no need.
The information from [..] is stripped from the actual applied patch.

Thanks

>
> Ram
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCHv12 0/3] rdmacg: IB/core: rdma controller support
From: Leon Romanovsky @ 2016-11-01 14:07 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-rdma, Li Zefan,
	Johannes Weiner, Doug Ledford, Christoph Hellwig, Liran Liss,
	Hefty, Sean, Jason Gunthorpe, Haggai Eran,
	james.l.morris-QHcLZuEGTsvQT0dZR+AlfA, Or Gerlitz, Matan Barak
In-Reply-To: <CAG53R5VKwntDHX101+5aaGoyKMKQuiKQWam575iFAxhmKxhE1g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 582 bytes --]

On Tue, Nov 01, 2016 at 04:33:23PM +0530, Parav Pandit wrote:
>
> > 4. Cgroup configuration should be as close as possible to "standard" if
> > such exists, so all infinite internet guides will work for RDMA too.
> I didnt follow this comment. Can you please explain? Are you saying
> rdma cgroup should have define all the objects of IB spec?

It is not related to spec at all. There were comments from Tejun and you that
other cgroups (CPU, ...) have different semantics and RDMA has something unique
(I don't remember what was it). I want to see minimal uniqueness RDMA cgroups.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* RE: [PATCH] qedr: Fix missing unlock on error in qedr_post_send()
From: Amrani, Ram @ 2016-11-01 14:09 UTC (permalink / raw)
  To: Leon Romanovsky, Wei Yongjun
  Cc: Doug Ledford, Sean Hefty, Hal Rosenstock, Borundia, Rajesh,
	Wei Yongjun, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20161031053835.GV3617-2ukJVAZIZ/Y@public.gmane.org>

> > index a615142..e7c7417 100644
> > --- a/drivers/infiniband/hw/qedr/verbs.c
> > +++ b/drivers/infiniband/hw/qedr/verbs.c
> > @@ -2983,7 +2983,8 @@ int qedr_post_send(struct ib_qp *ibqp, struct
> > ib_send_wr *wr,
> >
> >  	if (!wr) {
> >  		DP_ERR(dev, "Got an empty post send.\n");
> > -		return -EINVAL;
> > +		rc = -EINVAL;
> > +		goto out_unlock;
> >  	}
> 
> IMHO, this if needs to be moved to be before acquiring spinlock and avoid
> introducing new labels for this one case only.
> 

Thanks Wei and Leon.

Actually, perhaps we can totally remove the check itself -
Since this is kernel space, is it safe to presume that all ULPs are trusted to be well coded?
(if not, and as a side note, I see that in MLX4 there's a for(..;wr;..) loop, but that wr is 
dereferenced earlier so perhaps this is a potential bug)

Ram


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-rc] IB/ipoib: print only once when doesn't support IB_QP_CREATE_USE_GFP_NOIO
From: Leon Romanovsky @ 2016-11-01 14:14 UTC (permalink / raw)
  To: Yuval Shaia
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Erez Shitrit
In-Reply-To: <20161101134823.GA6681-Hxa29pjIrERMGUUWBy6pNA@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 2176 bytes --]

On Tue, Nov 01, 2016 at 03:48:24PM +0200, Yuval Shaia wrote:
> On Tue, Nov 01, 2016 at 02:34:13PM +0200, Leon Romanovsky wrote:
> > From: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >
> > Currently when the card doesn't support IB_QP_CREATE_USE_GFP_NOIO it warns
> > on every QP creation, It becomes worse when driver works in connected mode
> > we will see one print on each new connection, instead do it once.
> >
> > Fixes: 09b93088d7 ('Add a QP creation flag to use GFP_NOIO allocations')
> > Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> > ---
> >  drivers/infiniband/ulp/ipoib/ipoib_cm.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > index 4ad297d..917393b 100644
> > --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > @@ -1053,8 +1053,8 @@ static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ipoib_
> >
> >  	tx_qp = ib_create_qp(priv->pd, &attr);
> >  	if (PTR_ERR(tx_qp) == -EINVAL) {
> > -		ipoib_warn(priv, "can't use GFP_NOIO for QPs on device %s, using GFP_KERNEL\n",
> > -			   priv->ca->name);
> > +		pr_warn_once("can't use GFP_NOIO for QPs on device %s, using GFP_KERNEL\n",
> > +			     priv->ca->name);
>
> But it will still re-print it for different device, right?

Good question,

pr_warn_once is defined as alias to printk_once [1]. That printk_once is
macro too [2] which will define local static read_once variable.

[1] http://lxr.free-electrons.com/source/include/linux/printk.h#L359
[2] http://lxr.free-electrons.com/source/include/linux/printk.h#L322

>
> >  		attr.create_flags &= ~IB_QP_CREATE_USE_GFP_NOIO;
> >  		tx_qp = ib_create_qp(priv->pd, &attr);
> >  	}
> > --
> > 2.7.4
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* RE: [PATCH v2 perftest] Support for Chelsio T6 devices
From: Steve Wise @ 2016-11-01 14:16 UTC (permalink / raw)
  To: 'Leon Romanovsky'
  Cc: 'Gil Rockah', 'Zohar Ben Aharon',
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161031231124.GA3617-2ukJVAZIZ/Y@public.gmane.org>

> Hi Steve,
> I merged both of your patches
> https://patchwork.kernel.org/patch/9341763/
> https://patchwork.kernel.org/patch/9219151/
> https://github.com/linux-rdma/perftest/pull/1
> 
> Sorry for the delay.

Thanks all!

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-rc] IB/ipoib: print only once when doesn't support IB_QP_CREATE_USE_GFP_NOIO
From: Yuval Shaia @ 2016-11-01 14:34 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Erez Shitrit
In-Reply-To: <20161101141454.GD3617-2ukJVAZIZ/Y@public.gmane.org>

On Tue, Nov 01, 2016 at 04:14:54PM +0200, Leon Romanovsky wrote:
> On Tue, Nov 01, 2016 at 03:48:24PM +0200, Yuval Shaia wrote:
> > On Tue, Nov 01, 2016 at 02:34:13PM +0200, Leon Romanovsky wrote:
> > > From: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > >
> > > Currently when the card doesn't support IB_QP_CREATE_USE_GFP_NOIO it warns
> > > on every QP creation, It becomes worse when driver works in connected mode
> > > we will see one print on each new connection, instead do it once.
> > >
> > > Fixes: 09b93088d7 ('Add a QP creation flag to use GFP_NOIO allocations')
> > > Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> > > ---
> > >  drivers/infiniband/ulp/ipoib/ipoib_cm.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > > index 4ad297d..917393b 100644
> > > --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > > @@ -1053,8 +1053,8 @@ static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ipoib_
> > >
> > >  	tx_qp = ib_create_qp(priv->pd, &attr);
> > >  	if (PTR_ERR(tx_qp) == -EINVAL) {
> > > -		ipoib_warn(priv, "can't use GFP_NOIO for QPs on device %s, using GFP_KERNEL\n",
> > > -			   priv->ca->name);
> > > +		pr_warn_once("can't use GFP_NOIO for QPs on device %s, using GFP_KERNEL\n",
> > > +			     priv->ca->name);
> >
> > But it will still re-print it for different device, right?
> 
> Good question,
> 
> pr_warn_once is defined as alias to printk_once [1]. That printk_once is
> macro too [2] which will define local static read_once variable.
> 
> [1] http://lxr.free-electrons.com/source/include/linux/printk.h#L359
> [2] http://lxr.free-electrons.com/source/include/linux/printk.h#L322

If only one HCA model is installed on the system then it should be fine,
but wonder if more then one, would we like to see the warning again? i think
yes.

> 
> >
> > >  		attr.create_flags &= ~IB_QP_CREATE_USE_GFP_NOIO;
> > >  		tx_qp = ib_create_qp(priv->pd, &attr);
> > >  	}
> > > --
> > > 2.7.4
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH] qedr: Fix possible memory leak in qedr_create_qp()
From: Amrani, Ram @ 2016-11-01 14:42 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Wei Yongjun, Doug Ledford, Borundia, Rajesh, Wei Yongjun,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20161031053318.GU3617-2ukJVAZIZ/Y@public.gmane.org>

> While looking on this patch and associated code to it, I noticed the 
> following code stack:
> 
> qedr_create_qp
> 	-->
> 	dev->ops->rdma_destroy_qp(dev->rdma_ctx, qp->qed_qp);
> 		-->
> 		qed_rdma_destroy_qp
> 			-->
> 			qed_roce_destroy_qp
> This function will check the QP state and return -INVAL and comment 
> that this QP needs to be prepared before destroying it.
> 
> However immediately after returning, you are calling to kfree(qp) 
> without any checks.
> 
> It looks like an error and it is worth to take a look on it.
>

That's a deep  level of reading... thanks.

When the QP is created its state is set in ecore_rdma_create_qp():
	qp->cur_state = ECORE_ROCE_QP_STATE_RESET;

When it is ecore_roce_destroy_qp() is invoked, the function *must* be in either RESET or two other states:
	if ((qp->cur_state != QED_ROCE_QP_STATE_RESET) &&
	    (qp->cur_state != QED_ROCE_QP_STATE_ERR) &&
	    (qp->cur_state != QED_ROCE_QP_STATE_INIT)) {
		DP_NOTICE(p_hwfn,
			  "QP must be in error, reset or init state before destroying it\n");
		return -EINVAL;
	}
So actually, we won't return -INVAL here.

The bug I see is that I see in our upstream code is that for RESET the normal "destroy" operations continue. But they shouldn't.
We need here something like this:
	if (qp->cur_state == ECORE_ROCE_QP_STATE_RESET)
		return 0;

Flow will return to qed_rdma_destroy_qp() that will release the qp resource in the qed_roce scope (our real purpose).
And then return to qedr_create_qp() where the qp resource will be released in the qedr scope.

And as a side issue - an improvement that can be added is to return the error code of the QP create and not of the QP destroy.

I'll the first later on and probably the second too.



> And did I miss the fix to memory leak posted during code review?
> 
As far as I know, I have supplied patches for all memory leaks. Can you direct me to a specific e-mail?

Thanks,
Ram

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox