public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Tom Tucker <tom@opengridcomputing.com>
To: Roland Dreier <rdreier@cisco.com>
Cc: ericvh@gmail.com, linux-kernel@vger.kernel.org,
	v9fs-devel@vger.kernel.org, rminnich@dancer.ca.sandia.gov,
	lionkov@lanl.gov
Subject: Re: [PATCH 01/03] 9prdma: RDMA Transport Support for 9P
Date: Mon, 06 Oct 2008 13:10:38 -0500	[thread overview]
Message-ID: <48EA549E.1050001@opengridcomputing.com> (raw)
In-Reply-To: <adaod23d1if.fsf@cisco.com>

Roland Dreier wrote:
> very cool... neat that it only takes 1000 lines of code to do this too.
> 
> a few quick comments from a cursory read:
> 
>  > This file implements the RDMA transport provider for 9P. It allows
>  > mounts to be performed over iWARP and IB capable network interfaces
>  > and uses the OpenFabrics API to perform I/O.
> 
> I don't like this "openfabrics API" terminology -- the RDMA API is just
> one part of the kernel API that you're using, and I'm not sure it's
> worth calling out specially.  ie just delete that third line from the
> changelog.
> 

Ok.

>  > +	atomic_t next_tag;
> 
> this seems to be a write-only field... delete it?

This is left over garbage. Nice catch.

> 
>  > + * @wc_op: Mellanox's broken HW doesn't provide the original WR op
>  > + * when the CQE completes in error. This forces apps to keep track of
>  > + * the op themselves. Yes, it's a Pet Peeve of mine ;-)
> 
> there's nothing broken about this behavior -- the IB spec very
> explicitly calls out that the opcode field is undefined for completions
> with error status.
>

Fair enough...

>  > +find_req(struct p9_trans_rdma *rdma, u16 tag)
> 
>  > +	return found ? req : 0;
> 
> using 0 instead of NULL here... probably worth running all this through
> sparse to see what else if flags.
>

will do.


>  > +	if (atomic_inc_return(&rdma->sq_count) >= rdma->sq_depth)
>  > +		wait_event_interruptible
>  > +			(rdma->send_wait,
>  > +			 (atomic_read(&rdma->sq_count) < rdma->sq_depth));
> 
> this worries me a bit... for one thing you use wait_event_interruptible()
> without checking the return value, so it's entirely possible that this
> gets woken up before the condition actually becomes true.  And to handle
> that correctly, you'd need extra code to decrement the sq_count and wake
> up other waiters if it was interrupted, etc.
>

Yes, I think this needs to be wait_event.

> I know counting semaphores are out of favor in the kernel but this seems
> to be a good place to use real semaphores, rather than concocting your
> own in a much more complicated way.
> 


>  > +	wait_for_completion_interruptible(&rdma->cm_done);
> 
> similarly there are lots of places you do this interruptible wait but
> then don't check if you were interrupted.
> 

Well. The only place I do that is when the connection is being 
established. And actually, I don't(didn't?) really care why it stopped 
waiting, I  only care about the state of the connection and if it's not 
what I need it to be I error out. I suppose there is a theoretical race 
here where the user cancels the wait with ctrl-C, but the connection 
keeps going because the transport always gets back fast enough. This 
would eventually fail in rdma_rpc.

I'll add the code to check so that it's obvious.

>  > +	if (0 == (devattr.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY)) {
>  > +		rdma->dma_mr = ib_get_dma_mr(rdma->pd, IB_ACCESS_LOCAL_WRITE);
>  > +		if (IS_ERR(rdma->dma_mr))
>  > +			goto error;
>  > +		rdma->lkey = rdma->dma_mr->lkey;
>  > +	} else
>  > +		rdma->lkey = rdma->cm_id->device->local_dma_lkey;
> 
> seems to me this could be written more naturally and avoid the "if
> (0==(test))" construction (which I always have to reread to parse) if
> you reverse the test:
> 
> 	if (devattr.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY) {
> 		rdma->lkey = rdma->cm_id->device->local_dma_lkey;
> 	} else {
> 		rdma->dma_mr = ib_get_dma_mr(rdma->pd, IB_ACCESS_LOCAL_WRITE);
> 		if (IS_ERR(rdma->dma_mr))
> 			goto error;
> 		rdma->lkey = rdma->dma_mr->lkey;
> 	}
> 
agree. thanks.

> The only place I see a send request posted has:
> 
>  > +	wr.opcode = IB_WR_SEND;
> 
> so I guess your protocol is only send/receives (ie no RDMA in the
> pedantic sense, just two-sided operations).  I'm wondering what the
> motivation for all this is: is using send/receive on IB/iWARP a big
> performance win over non-offloaded transport?  Even compared to TCP on a
> NIC with LSO and (software) LRO?
> 

For IB it avoids IPoIB. Which I think is a big win. For iWARP, it's  not 
so clear. There is the potential for RDMA ops in the future.

> Is your protocol documented anywhere?  It seems npfs has support for the
> same transport.
> 

When using SEND/RECV, there isn't a protocol other than 9P itself. Once 
we add RDMA ops, one will be required to exchange RKEY, etc... But maybe 
I don't understand the question?

> Finally, you create your QP with:
> 
>  > +	qp_attr.sq_sig_type = IB_SIGNAL_REQ_WR;
> 
> but you unconditionally do:
> 
>  > +	wr.send_flags = IB_SEND_SIGNALED;
> 
> not a big deal but you could save one line of code and a correspondingly
> tiny amount of .text by just making the send queue signal all requests.
> 

ok.

Thanks for the review,
Tom

>  - R.


  reply	other threads:[~2008-10-06 18:10 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-02  0:08 [PATCH 00/03] RDMA Transport Support for 9P Tom Tucker
2008-10-02  0:08 ` [PATCH 01/03] 9prdma: " Tom Tucker
2008-10-02  0:08   ` [PATCH 02/03] 9prdma: Makefile change for the RDMA transport Tom Tucker
2008-10-02  0:08     ` [PATCH 03/03] 9prdma: Kconfig changes " Tom Tucker
2008-10-02  5:11   ` [PATCH 01/03] 9prdma: RDMA Transport Support for 9P Roland Dreier
2008-10-06 18:10     ` Tom Tucker [this message]
2008-10-06 22:35       ` Roland Dreier
  -- strict thread matches above, loose matches on Subject: below --
2008-10-06 15:56 [PATCH 00/03] " Tom Tucker
2008-10-06 15:56 ` [PATCH 01/03] 9prdma: " Tom Tucker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48EA549E.1050001@opengridcomputing.com \
    --to=tom@opengridcomputing.com \
    --cc=ericvh@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lionkov@lanl.gov \
    --cc=rdreier@cisco.com \
    --cc=rminnich@dancer.ca.sandia.gov \
    --cc=v9fs-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox