From: Jason Gunthorpe <jgg@ziepe.ca>
To: Gal Pressman <galpress@amazon.com>
Cc: Doug Ledford <dledford@redhat.com>,
linux-rdma@vger.kernel.org,
Alexander Matushevsky <matua@amazon.com>,
Firas JahJah <firasj@amazon.com>,
Yossi Leybovich <sleybo@amazon.com>
Subject: Re: [PATCH for-next 2/3] RDMA/efa: Count mmap failures
Date: Sun, 26 Apr 2020 10:30:00 -0300 [thread overview]
Message-ID: <20200426133000.GL26002@ziepe.ca> (raw)
In-Reply-To: <523c9dee-cedf-b762-8a68-cd1232e87e48@amazon.com>
On Sun, Apr 26, 2020 at 09:42:27AM +0300, Gal Pressman wrote:
> On 24/04/2020 21:26, Jason Gunthorpe wrote:
> > On Fri, Apr 24, 2020 at 06:25:54PM +0300, Gal Pressman wrote:
> >> On 24/04/2020 17:59, Jason Gunthorpe wrote:
> >>> On Mon, Apr 20, 2020 at 09:22:12AM +0300, Gal Pressman wrote:
> >>>> Add a new stat that counts mmap failures, which might help when
> >>>> debugging different issues.
> >>>>
> >>>> Reviewed-by: Firas JahJah <firasj@amazon.com>
> >>>> Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
> >>>> Signed-off-by: Gal Pressman <galpress@amazon.com>
> >>>> drivers/infiniband/hw/efa/efa.h | 3 ++-
> >>>> drivers/infiniband/hw/efa/efa_verbs.c | 9 +++++++--
> >>>> 2 files changed, 9 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/drivers/infiniband/hw/efa/efa.h b/drivers/infiniband/hw/efa/efa.h
> >>>> index aa7396a1588a..77c9ff798117 100644
> >>>> +++ b/drivers/infiniband/hw/efa/efa.h
> >>>> @@ -1,6 +1,6 @@
> >>>> /* SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause */
> >>>> /*
> >>>> - * Copyright 2018-2019 Amazon.com, Inc. or its affiliates. All rights reserved.
> >>>> + * Copyright 2018-2020 Amazon.com, Inc. or its affiliates. All rights reserved.
> >>>> */
> >>>>
> >>>> #ifndef _EFA_H_
> >>>> @@ -40,6 +40,7 @@ struct efa_sw_stats {
> >>>> atomic64_t reg_mr_err;
> >>>> atomic64_t alloc_ucontext_err;
> >>>> atomic64_t create_ah_err;
> >>>> + atomic64_t mmap_err;
> >>>> };
> >>>>
> >>>> /* Don't use anything other than atomic64 */
> >>>> diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
> >>>> index b555845d6c14..75eef1ec2474 100644
> >>>> +++ b/drivers/infiniband/hw/efa/efa_verbs.c
> >>>> @@ -44,7 +44,8 @@ struct efa_user_mmap_entry {
> >>>> op(EFA_CREATE_CQ_ERR, "create_cq_err") \
> >>>> op(EFA_REG_MR_ERR, "reg_mr_err") \
> >>>> op(EFA_ALLOC_UCONTEXT_ERR, "alloc_ucontext_err") \
> >>>> - op(EFA_CREATE_AH_ERR, "create_ah_err")
> >>>> + op(EFA_CREATE_AH_ERR, "create_ah_err") \
> >>>> + op(EFA_MMAP_ERR, "mmap_err")
> >>>>
> >>>> #define EFA_STATS_ENUM(ename, name) ename,
> >>>> #define EFA_STATS_STR(ename, name) [ename] = name,
> >>>> @@ -1569,6 +1570,7 @@ static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext *ucontext,
> >>>> ibdev_dbg(&dev->ibdev,
> >>>> "pgoff[%#lx] does not have valid entry\n",
> >>>> vma->vm_pgoff);
> >>>> + atomic64_inc(&dev->stats.sw_stats.mmap_err);
> >>>> return -EINVAL;
> >>>> }
> >>>> entry = to_emmap(rdma_entry);
> >>>> @@ -1604,12 +1606,14 @@ static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext *ucontext,
> >>>> err = -EINVAL;
> >>>> }
> >>>>
> >>>> - if (err)
> >>>> + if (err) {
> >>>> ibdev_dbg(
> >>>> &dev->ibdev,
> >>>> "Couldn't mmap address[%#llx] length[%#zx] mmap_flag[%d] err[%d]\n",
> >>>> entry->address, rdma_entry->npages * PAGE_SIZE,
> >>>> entry->mmap_flag, err);
> >>>> + atomic64_inc(&dev->stats.sw_stats.mmap_err);
> >>>
> >>> Really? Isn't this something that is only possible with a buggy
> >>> rdma-core provider? Why count it?
> >>
> >> Though unlikely, it could happen, otherwise this error flow wouldn't exist in
> >> the first place.
> >>
> >> If for some reason a customer app steps on a bug we're not aware of, this
> >> counter could serve as a red flag.
> >
> > But there are lots of cases where a buggy provider can cause error
> > exits, why choose this one to count against all the others?
>
> It's not one against all others, most if not all of our userspace facing API
> error flows have a similar counter.
Hurm, seems a bit strange, but OK
> And TBH, I think that the mmap flow is quite convoluted with the cookie response
> from the crate verb, so it deserves a counter IMO.
How so? Userspace takes the u64 from the command and pass it to mmap,
what is convoluted?
Jason
next prev parent reply other threads:[~2020-04-26 13:30 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-20 6:22 [PATCH for-next 0/3] EFA statistics minor updates Gal Pressman
2020-04-20 6:22 ` [PATCH for-next 1/3] RDMA/efa: Report create CQ error counter Gal Pressman
2020-04-20 6:22 ` [PATCH for-next 2/3] RDMA/efa: Count mmap failures Gal Pressman
2020-04-24 14:59 ` Jason Gunthorpe
2020-04-24 15:25 ` Gal Pressman
2020-04-24 18:26 ` Jason Gunthorpe
2020-04-26 6:42 ` Gal Pressman
2020-04-26 13:30 ` Jason Gunthorpe [this message]
2020-04-26 14:17 ` Gal Pressman
2020-04-20 6:22 ` [PATCH for-next 3/3] RDMA/efa: Count admin commands errors Gal Pressman
2020-05-02 23:33 ` [PATCH for-next 0/3] EFA statistics minor updates Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200426133000.GL26002@ziepe.ca \
--to=jgg@ziepe.ca \
--cc=dledford@redhat.com \
--cc=firasj@amazon.com \
--cc=galpress@amazon.com \
--cc=linux-rdma@vger.kernel.org \
--cc=matua@amazon.com \
--cc=sleybo@amazon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.