From: Steve Wise <swise@opengridcomputing.com>
To: Bernard Metzler <BMT@zurich.ibm.com>
Cc: linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
Roland Dreier <rdreier@cisco.com>
Subject: Re: [PATCH] SIW: Object management
Date: Tue, 05 Oct 2010 10:37:49 -0500 [thread overview]
Message-ID: <4CAB464D.5030702@opengridcomputing.com> (raw)
In-Reply-To: <OFFCB10B70.3E1CC39D-ONC12577B3.0053113C-C12577B3.0054C076@ch.ibm.com>
On 10/05/2010 10:25 AM, Bernard Metzler wrote:
> Steve Wise<swise@opengridcomputing.com> wrote on 10/05/2010 05:02:37 PM:
>
>
>> Steve Wise<swise@opengridcomputing.com>
>> 10/05/2010 05:02 PM
>>
>> To
>>
>> Bernard Metzler<BMT@zurich.ibm.com>
>>
>> cc
>>
>> linux-rdma@vger.kernel.org, netdev@vger.kernel.org
>>
>> Subject
>>
>> Re: [PATCH] SIW: Object management
>>
>> On 10/05/2010 09:56 AM, Bernard Metzler wrote:
>>
>>> Steve Wise<swise@opengridcomputing.com> wrote on 10/05/2010 04:26:48
>>>
> PM:
>
>>>
>>>
>>>> Steve Wise<swise@opengridcomputing.com>
>>>> 10/05/2010 04:26 PM
>>>>
>>>> To
>>>>
>>>> Bernard Metzler<bmt@zurich.ibm.com>
>>>>
>>>> cc
>>>>
>>>> netdev@vger.kernel.org, linux-rdma@vger.kernel.org
>>>>
>>>> Subject
>>>>
>>>> Re: [PATCH] SIW: Object management
>>>>
>>>> On 10/05/2010 01:54 AM, Bernard Metzler wrote:
>>>>
>>>> <snip>+
>>>>
>>>>
>>>>> +
>>>>> +/***** routines for WQE handling ***/
>>>>> +
>>>>> +/*
>>>>> + * siw_wqe_get()
>>>>> + *
>>>>> + * Get new WQE. For READ RESPONSE, take it from the free list which
>>>>> + * has a maximum size of maximum inbound READs. All other WQE are
>>>>> + * malloc'ed which creates some overhead. Consider change to
>>>>> + *
>>>>> + * 1. malloc WR only if it cannot be synchonously completed, or
>>>>> + * 2. operate own cache of reuseable WQE's.
>>>>> + *
>>>>> + * Current code trusts on malloc efficiency.
>>>>> + */
>>>>> +inline struct siw_wqe *siw_wqe_get(struct siw_qp *qp, enum
>>>>>
>>>>>
>>>> siw_wr_opcode op)
>>>>
>>>>
>>>>> +{
>>>>> + struct siw_wqe *wqe;
>>>>> +
>>>>> + if (op == SIW_WR_RDMA_READ_RESP) {
>>>>> + spin_lock(&qp->freelist_lock);
>>>>> + if (!(list_empty(&qp->wqe_freelist))) {
>>>>> + wqe = list_entry(qp->wqe_freelist.next,
>>>>> + struct siw_wqe, list);
>>>>> + list_del(&wqe->list);
>>>>> + spin_unlock(&qp->freelist_lock);
>>>>> + wqe->processed = 0;
>>>>> + dprint(DBG_OBJ|DBG_WR,
>>>>> + "(QP%d): WQE from FreeList p: %p\n",
>>>>> + QP_ID(qp), wqe);
>>>>> + } else {
>>>>> + spin_unlock(&qp->freelist_lock);
>>>>> + wqe = NULL;
>>>>> + dprint(DBG_ON|DBG_OBJ|DBG_WR,
>>>>> + "(QP%d): FreeList empty!\n", QP_ID(qp));
>>>>> + }
>>>>> + } else {
>>>>> + wqe = kzalloc(sizeof(struct siw_wqe), GFP_KERNEL);
>>>>> + dprint(DBG_OBJ|DBG_WR, "(QP%d): New WQE p: %p\n",
>>>>> + QP_ID(qp), wqe);
>>>>> + }
>>>>>
>>>>>
>>>>>
>>>> I think you can't allocate at GFP_KERNEL here if this is called from
>>>>
> the
>
>>>>
>>>
>>>> post_ functions. I think you might want to pre-allocate these when
>>>>
> you
>
>>>> create the QP...
>>>>
>>>>
>>>>
>>> the idea was to keep the memory footprint small and flexible
>>> while using the linux/list.h routines to manipulate all queues
>>> (no ring buffers etc, just lists). at the same time we
>>> decided to take the provided uverbs_cmd-syscall path down to
>>> the driver even for the post_-functions (since we would have to ring a
>>> doorbell on the send path anyway, which in software, is a syscall).
>>> in that path, even ib_uverbs_post_send() does one kmalloc() per wr
>>> (it would be helpful if the provider could keep and reuse that wr of
>>> known size, freeing it later at its own premises. that would avoid
>>> the second kmalloc here.)
>>>
>>> currently only work queue elements which are needed to satisfy
>>> inbound read requests are pre-allocated (amount corresponding
>>> to inbound read queue depth), since the read response is
>>> scheduled in network softirq context which must not sleep.
>>>
>>> that discussion may relate to the spinlock at the entrance to the
>>> post_ verbs. going down the uverbs_cmd path may sleep anyway...?
>>>
>>>
>>>
>>
>> The uverb calls may sleep, but certain kernel verbs must not. Remember,
>>
>
>> the post_send/recv and other functions in your driver are called
>> directly (almost) by kernel users like NFSRDMA. These users may be
>> calling in an interrupt context and thus you cannot block/sleep.
>>
>>
> OK, very convincing. not a big change since siw_wqe_get/_put()
> already maintain a list of pre-allocated wqe's (currently for
> the read.responses).
> but, would it be ok if the code distinguishes between user
> land and in-kernel consumers? i would be very happy if we could
> keep the pre-allocations per user land connection to its very
> minimum...
>
>
I think that's ok, but its bending the core locking rules a little I
guess. But the intent is that kernel users can definitely
send/recv/poll in interrupt context, so possibly blocking for user mode
QPs in on-kernel-bypass operations is probably ok...
What do you think Roland?
Steve.
next prev parent reply other threads:[~2010-10-05 15:37 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-05 6:54 [PATCH] SIW: Object management Bernard Metzler
2010-10-05 14:26 ` Steve Wise
[not found] ` <4CAB35A8.6080906-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-10-05 14:56 ` Bernard Metzler
2010-10-05 15:02 ` Steve Wise
2010-10-05 15:25 ` Bernard Metzler
2010-10-05 15:37 ` Steve Wise [this message]
[not found] ` <4CAB464D.5030702-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-10-09 14:10 ` Bernard Metzler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CAB464D.5030702@opengridcomputing.com \
--to=swise@opengridcomputing.com \
--cc=BMT@zurich.ibm.com \
--cc=linux-rdma@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=rdreier@cisco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.