From: Tom Tucker <tom@opengridcomputing.com>
To: Roland Dreier <rdreier@cisco.com>
Cc: ericvh@gmail.com, linux-kernel@vger.kernel.org,
v9fs-devel@vger.kernel.org, rminnich@dancer.ca.sandia.gov,
lionkov@lanl.gov
Subject: Re: [PATCH 01/03] 9prdma: RDMA Transport Support for 9P
Date: Mon, 06 Oct 2008 13:10:38 -0500 [thread overview]
Message-ID: <48EA549E.1050001@opengridcomputing.com> (raw)
In-Reply-To: <adaod23d1if.fsf@cisco.com>
Roland Dreier wrote:
> very cool... neat that it only takes 1000 lines of code to do this too.
>
> a few quick comments from a cursory read:
>
> > This file implements the RDMA transport provider for 9P. It allows
> > mounts to be performed over iWARP and IB capable network interfaces
> > and uses the OpenFabrics API to perform I/O.
>
> I don't like this "openfabrics API" terminology -- the RDMA API is just
> one part of the kernel API that you're using, and I'm not sure it's
> worth calling out specially. ie just delete that third line from the
> changelog.
>
Ok.
> > + atomic_t next_tag;
>
> this seems to be a write-only field... delete it?
This is left over garbage. Nice catch.
>
> > + * @wc_op: Mellanox's broken HW doesn't provide the original WR op
> > + * when the CQE completes in error. This forces apps to keep track of
> > + * the op themselves. Yes, it's a Pet Peeve of mine ;-)
>
> there's nothing broken about this behavior -- the IB spec very
> explicitly calls out that the opcode field is undefined for completions
> with error status.
>
Fair enough...
> > +find_req(struct p9_trans_rdma *rdma, u16 tag)
>
> > + return found ? req : 0;
>
> using 0 instead of NULL here... probably worth running all this through
> sparse to see what else if flags.
>
will do.
> > + if (atomic_inc_return(&rdma->sq_count) >= rdma->sq_depth)
> > + wait_event_interruptible
> > + (rdma->send_wait,
> > + (atomic_read(&rdma->sq_count) < rdma->sq_depth));
>
> this worries me a bit... for one thing you use wait_event_interruptible()
> without checking the return value, so it's entirely possible that this
> gets woken up before the condition actually becomes true. And to handle
> that correctly, you'd need extra code to decrement the sq_count and wake
> up other waiters if it was interrupted, etc.
>
Yes, I think this needs to be wait_event.
> I know counting semaphores are out of favor in the kernel but this seems
> to be a good place to use real semaphores, rather than concocting your
> own in a much more complicated way.
>
> > + wait_for_completion_interruptible(&rdma->cm_done);
>
> similarly there are lots of places you do this interruptible wait but
> then don't check if you were interrupted.
>
Well. The only place I do that is when the connection is being
established. And actually, I don't(didn't?) really care why it stopped
waiting, I only care about the state of the connection and if it's not
what I need it to be I error out. I suppose there is a theoretical race
here where the user cancels the wait with ctrl-C, but the connection
keeps going because the transport always gets back fast enough. This
would eventually fail in rdma_rpc.
I'll add the code to check so that it's obvious.
> > + if (0 == (devattr.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY)) {
> > + rdma->dma_mr = ib_get_dma_mr(rdma->pd, IB_ACCESS_LOCAL_WRITE);
> > + if (IS_ERR(rdma->dma_mr))
> > + goto error;
> > + rdma->lkey = rdma->dma_mr->lkey;
> > + } else
> > + rdma->lkey = rdma->cm_id->device->local_dma_lkey;
>
> seems to me this could be written more naturally and avoid the "if
> (0==(test))" construction (which I always have to reread to parse) if
> you reverse the test:
>
> if (devattr.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY) {
> rdma->lkey = rdma->cm_id->device->local_dma_lkey;
> } else {
> rdma->dma_mr = ib_get_dma_mr(rdma->pd, IB_ACCESS_LOCAL_WRITE);
> if (IS_ERR(rdma->dma_mr))
> goto error;
> rdma->lkey = rdma->dma_mr->lkey;
> }
>
agree. thanks.
> The only place I see a send request posted has:
>
> > + wr.opcode = IB_WR_SEND;
>
> so I guess your protocol is only send/receives (ie no RDMA in the
> pedantic sense, just two-sided operations). I'm wondering what the
> motivation for all this is: is using send/receive on IB/iWARP a big
> performance win over non-offloaded transport? Even compared to TCP on a
> NIC with LSO and (software) LRO?
>
For IB it avoids IPoIB. Which I think is a big win. For iWARP, it's not
so clear. There is the potential for RDMA ops in the future.
> Is your protocol documented anywhere? It seems npfs has support for the
> same transport.
>
When using SEND/RECV, there isn't a protocol other than 9P itself. Once
we add RDMA ops, one will be required to exchange RKEY, etc... But maybe
I don't understand the question?
> Finally, you create your QP with:
>
> > + qp_attr.sq_sig_type = IB_SIGNAL_REQ_WR;
>
> but you unconditionally do:
>
> > + wr.send_flags = IB_SEND_SIGNALED;
>
> not a big deal but you could save one line of code and a correspondingly
> tiny amount of .text by just making the send queue signal all requests.
>
ok.
Thanks for the review,
Tom
> - R.
next prev parent reply other threads:[~2008-10-06 18:10 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-02 0:08 [PATCH 00/03] RDMA Transport Support for 9P Tom Tucker
2008-10-02 0:08 ` [PATCH 01/03] 9prdma: " Tom Tucker
2008-10-02 0:08 ` [PATCH 02/03] 9prdma: Makefile change for the RDMA transport Tom Tucker
2008-10-02 0:08 ` [PATCH 03/03] 9prdma: Kconfig changes " Tom Tucker
2008-10-02 5:11 ` [PATCH 01/03] 9prdma: RDMA Transport Support for 9P Roland Dreier
2008-10-06 18:10 ` Tom Tucker [this message]
2008-10-06 22:35 ` Roland Dreier
-- strict thread matches above, loose matches on Subject: below --
2008-10-06 15:56 [PATCH 00/03] " Tom Tucker
2008-10-06 15:56 ` [PATCH 01/03] 9prdma: " Tom Tucker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48EA549E.1050001@opengridcomputing.com \
--to=tom@opengridcomputing.com \
--cc=ericvh@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lionkov@lanl.gov \
--cc=rdreier@cisco.com \
--cc=rminnich@dancer.ca.sandia.gov \
--cc=v9fs-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox