public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Gal Pressman <galpress@amazon.com>
To: "Hefty, Sean" <sean.hefty@intel.com>, Jason Gunthorpe <jgg@ziepe.ca>
Cc: RDMA mailing list <linux-rdma@vger.kernel.org>
Subject: Re: ibv_req_notify_cq clarification
Date: Tue, 23 Feb 2021 14:23:13 +0200	[thread overview]
Message-ID: <47a7c3a3-ebe1-7fa1-b47e-650ad4d6e736@amazon.com> (raw)
In-Reply-To: <MW3PR11MB46519518F772EAFA758C45229E819@MW3PR11MB4651.namprd11.prod.outlook.com>

On 23/02/2021 0:41, Hefty, Sean wrote:
>>> It eliminates races, if the consumer says 'I read up to X send me an
>>> interrupt if X+1 exists' when X+1 already exists if there is a race
>>> producer has already written it. So send an interrupt.
>>
>> Right, that's what I was getting at in my first question, this isn't the next
>> completion from the device's perspective.
> 
> From the spec:
> 
> Requests the CQ event handler be called when the next completion
> entry of the specified type is added to the specified CQ. The handler
> is called at most once per Request Completion Notification call for a
> particular CQ. Any CQ entries that existed before the notify is enabled
> will not result in a call to the handler.
> 
> It may help to think of it in terms of edge versus level triggered.  The IB spec defines the CQ as behaving similar to an edge triggered object.  I would guess that 99% of developers would successfully write an edge-triggered based application that hangs.  A level triggered wait is much easier to get correct.

Agree, but a level triggered wait contradicts the
"Any CQ entries that existed before the notify is enabled will not result in a
call to the handler."

>> So in such case the consumer index in the arm doorbell is used to indicate what
>> should be considered a "new" completion? This is new from the app perspective.
> 
> The verbs API only guarantees that the CQ will be signaled when a new completion relative to the HW has been written.  If there's a driver that converts the behavior into a level triggered operation, props to them!  That implementation is far more likely to make incorrectly written applications work than ever break a correctly written one.

I agree that this behavior is most likely better, but it contradicts the api
agreement.
It seems like you should write your app according to the provider you're using
in order to get it right.

>> But looking at ibv_ud_pingpong for example, I don't understand how that could
>> even work.
>> The test arms the CQ on creation (consumder index 0), calls ibv_get_cq_event(),
>> wakes up and immediately arms the CQ again (before polling, consumer index is
>> still 0).
> 
> In the worst case, arming the CQ just results in select/poll/epoll returning immediately the next time it is called.  The thread finds the CQ empty, rearms the CQ again, and select/poll/epoll finally block.
> 
>>>> Running this with 32 iterations, the client does something like:
>>>> - arm cq
>>>> - post send x 32
>>>> - wait for cq event
>>>> - arm cq
>>>> - poll cq (once, with batch size of 16)
>>>> - no more post send (reached tot_iters)
>>>> - wait for cq event (but an event has already been generated?)
>>>
>>> I don't know much about perf-test, but in verbs arming a non-empty CQ
>>> is asking for trouble
>>
>> Do you have a way to verify whether this test gets stuck? Maybe I am missing
>> something?
> 
> Looking at the code, I agree that it looks racy in a way that could cause it to hang.  My guess is most people running perftests are looking for the best performance numbers, so blocking completions are rarely enabled.  And even if they were, it's still a race condition as to whether the test would hang.
> 
>> What do you mean by arming a non-empty CQ?
>> The man pages suggest a scheme where the app should call ibv_get_cq_event()
>> followed by an ibv_req_notify_cq(), the CQ polling/emptying comes after these,
>> so at the time of arm the CQ isn't empty.
> 
> The app can do:
> 
> Wakeup
> While read cqe succeeds
> 	Process cqe
> Read cq event
> Arm cq
> /* cqe's may have been written between the last read and arming */
> While read cqe succeeds
> 	Process cqe
> Sleep
> 
> Or shorten this to:
> 
> Wakeup
> Read cq event
> Arm cq
> While read cqe succeeds
> 	Process cqe
> Sleep
> 
> In both cases, all cqe's must be processed after calling arm, and it's possible to read a cq event only to find the cq empty.  One could argue which is more efficient, but we're talking about a sleeping thread in either case.

Yea, this seems correct.
But as I said in my reply to Jason, libibverbs examples, pyverbs tests and
perftest all go from "Read cq event" to "Arm cq" immediately.

  reply	other threads:[~2021-02-23 12:24 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-18  9:13 ibv_req_notify_cq clarification Gal Pressman
2021-02-18 12:38 ` Bernard Metzler
2021-02-18 12:47   ` Gal Pressman
2021-02-18 13:59     ` Bernard Metzler
2021-02-18 12:53 ` Jason Gunthorpe
2021-02-18 15:52   ` Gal Pressman
2021-02-18 16:23     ` Jason Gunthorpe
2021-02-18 18:47       ` Bernard Metzler
2021-02-18 22:22       ` Tom Talpey
2021-02-18 22:51         ` Jason Gunthorpe
2021-02-18 23:07           ` Tom Talpey
2021-02-19  0:45             ` Jason Gunthorpe
2021-02-19 14:31               ` Tom Talpey
2021-02-19 14:42                 ` Jason Gunthorpe
2021-02-21  9:25       ` Gal Pressman
2021-02-22 13:46         ` Jason Gunthorpe
2021-02-22 15:36           ` Gal Pressman
2021-02-22 15:55             ` Jason Gunthorpe
2021-02-22 19:24               ` Gal Pressman
2021-02-22 19:37                 ` Jason Gunthorpe
2021-02-23 12:18                   ` Gal Pressman
2021-02-23 12:38                     ` Jason Gunthorpe
2021-02-22 22:41                 ` Hefty, Sean
2021-02-23 12:23                   ` Gal Pressman [this message]
2021-02-23 20:48                     ` Hefty, Sean
2021-02-22 18:38             ` Hefty, Sean
2021-02-22 19:26               ` Gal Pressman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47a7c3a3-ebe1-7fa1-b47e-650ad4d6e736@amazon.com \
    --to=galpress@amazon.com \
    --cc=jgg@ziepe.ca \
    --cc=linux-rdma@vger.kernel.org \
    --cc=sean.hefty@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox