linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* NFS over RDMA problem: svcrdma: Error fast registering memory for xprt ffff8803307d7400
@ 2010-07-13  8:45 Gyorgy Jeney
       [not found] ` <AANLkTinjZfpICMjP1EcU4CPAe1TwxKp1zsk-lwX6fmzA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 2+ messages in thread
From: Gyorgy Jeney @ 2010-07-13  8:45 UTC (permalink / raw)
  To: linux-nfs

Hello all,

Should this mailing-list be the wrong place to ask, I apologise and I
would take it kindly, if someone could then point me to the right forum.

I am attempting to use NFS over RDMA (over infiniband), but there is some
problem.  The NFS filesystem can be mounted on the client, and things
will work for some time (can read, modify, etc. the files over the mount),
but then (at a seemingly random time) the NFS server will dump these
lines to the logs:

[ 4380.623922] svcrdma: Error fast registering memory for xprt ffff8803307d7400
[ 4413.343161] svcrdma: error fast registering xdr for xprt ffff8803319edc00

and the nfs filesystem on the NFS client will be unacessable.  Is this a
known problem?  Do workarounds exist?  Or am I just missing something?


Both client and server are running the latest vanilla 2.6.34.1 kernel
with Mellanox Connect-X infiniband cards.  If more information is
required, please do ask.

BTW: I can reproduce the problem quite reliably by running the bonnie++
"benchmark" on the NFS mounted filesystem.

nog.

ps: I'm not subscribed to the list, please CC me on all replies.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: NFS over RDMA problem: svcrdma: Error fast registering memory for xprt ffff8803307d7400
       [not found] ` <AANLkTinjZfpICMjP1EcU4CPAe1TwxKp1zsk-lwX6fmzA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-07-14 21:17   ` Gyorgy Jeney
  0 siblings, 0 replies; 2+ messages in thread
From: Gyorgy Jeney @ 2010-07-14 21:17 UTC (permalink / raw)
  To: linux-nfs, linux-rdma

> I am attempting to use NFS over RDMA (over infiniband), but there is =
some
> problem. =A0The NFS filesystem can be mounted on the client, and thin=
gs
> will work for some time (can read, modify, etc. the files over the mo=
unt),
> but then (at a seemingly random time) the NFS server will dump these
> lines to the logs:
>
> [ 4380.623922] svcrdma: Error fast registering memory for xprt ffff88=
03307d7400
> [ 4413.343161] svcrdma: error fast registering xdr for xprt ffff88033=
19edc00

Digging into it further, it seems like the Mellanox Infiniband driver
could somehow be involved.  Adding some trace's to the code, it's obvio=
us
something like this is happening:

At some time sq_cq_reap() is called, which ends up like this:

  sq_cq_reap()
    ib_poll_cq()
      mlx4_ib_poll_cq()
        mlx4_ib_poll_one()
          mlx4_ib_handle_error_cqe()
            - Which then sets wc->status to IB_WC_WR_FLUSH_ERR rather
              often, but the killer blow seems to be when
              IB_WC_REM_ACCESS_ERR is set.
    - Because of the error previously, sq_cq_reap sets the XPT_CLOSE
      flag

Then, sometime later:

  fast_reg_read_chunks()
    svc_rdma_fastreg()
      svc_rdma_send()
        svc_rdma_send()
          - XPT_CLOSE is set and hence -ENOTCONN is returned
    - Since svc_rdma_fastreg() had an error fast_reg_read_chunks() bail=
s
      and the client seems to then hang.

I'd ask the infiband guys, what does IB_WC_WR_FLUSH_ERR and
IB_WC_REM_ACCESS_ERR mean?  Is it something drastic that should result
in hangs?

nog.

> Both client and server are running the latest vanilla 2.6.34.1 kernel
> with Mellanox Connect-X infiniband cards. =A0If more information is
> required, please do ask.
>
> BTW: I can reproduce the problem quite reliably by running the bonnie=
++
> "benchmark" on the NFS mounted filesystem.
>
> nog.
>
> ps: I'm not subscribed to the list, please CC me on all replies.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2010-07-14 21:17 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-13  8:45 NFS over RDMA problem: svcrdma: Error fast registering memory for xprt ffff8803307d7400 Gyorgy Jeney
     [not found] ` <AANLkTinjZfpICMjP1EcU4CPAe1TwxKp1zsk-lwX6fmzA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-14 21:17   ` Gyorgy Jeney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).