Re: [PATCH 01/03] 9prdma: RDMA Transport Support for 9P

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Tom Tucker <tom@opengridcomputing.com>
To: Roland Dreier <rdreier@cisco.com>
Cc: ericvh@gmail.com, linux-kernel@vger.kernel.org,
	v9fs-devel@vger.kernel.org, rminnich@dancer.ca.sandia.gov,
	lionkov@lanl.gov
Subject: Re: [PATCH 01/03] 9prdma: RDMA Transport Support for 9P
Date: Mon, 06 Oct 2008 13:10:38 -0500	[thread overview]
Message-ID: <48EA549E.1050001@opengridcomputing.com> (raw)
In-Reply-To: <adaod23d1if.fsf@cisco.com>

Roland Dreier wrote:
> very cool... neat that it only takes 1000 lines of code to do this too.
> 
> a few quick comments from a cursory read:
> 
>  > This file implements the RDMA transport provider for 9P. It allows
>  > mounts to be performed over iWARP and IB capable network interfaces
>  > and uses the OpenFabrics API to perform I/O.
> 
> I don't like this "openfabrics API" terminology -- the RDMA API is just
> one part of the kernel API that you're using, and I'm not sure it's
> worth calling out specially.  ie just delete that third line from the
> changelog.
> 

Ok.

>  > +	atomic_t next_tag;
> 
> this seems to be a write-only field... delete it?

This is left over garbage. Nice catch.

> 
>  > + * @wc_op: Mellanox's broken HW doesn't provide the original WR op
>  > + * when the CQE completes in error. This forces apps to keep track of
>  > + * the op themselves. Yes, it's a Pet Peeve of mine ;-)
> 
> there's nothing broken about this behavior -- the IB spec very
> explicitly calls out that the opcode field is undefined for completions
> with error status.
>

Fair enough...

>  > +find_req(struct p9_trans_rdma *rdma, u16 tag)
> 
>  > +	return found ? req : 0;
> 
> using 0 instead of NULL here... probably worth running all this through
> sparse to see what else if flags.
>

will do.


>  > +	if (atomic_inc_return(&rdma->sq_count) >= rdma->sq_depth)
>  > +		wait_event_interruptible
>  > +			(rdma->send_wait,
>  > +			 (atomic_read(&rdma->sq_count) < rdma->sq_depth));
> 
> this worries me a bit... for one thing you use wait_event_interruptible()
> without checking the return value, so it's entirely possible that this
> gets woken up before the condition actually becomes true.  And to handle
> that correctly, you'd need extra code to decrement the sq_count and wake
> up other waiters if it was interrupted, etc.
>

Yes, I think this needs to be wait_event.

> I know counting semaphores are out of favor in the kernel but this seems
> to be a good place to use real semaphores, rather than concocting your
> own in a much more complicated way.
> 


>  > +	wait_for_completion_interruptible(&rdma->cm_done);
> 
> similarly there are lots of places you do this interruptible wait but
> then don't check if you were interrupted.
> 

Well. The only place I do that is when the connection is being 
established. And actually, I don't(didn't?) really care why it stopped 
waiting, I  only care about the state of the connection and if it's not 
what I need it to be I error out. I suppose there is a theoretical race 
here where the user cancels the wait with ctrl-C, but the connection 
keeps going because the transport always gets back fast enough. This 
would eventually fail in rdma_rpc.

I'll add the code to check so that it's obvious.

>  > +	if (0 == (devattr.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY)) {
>  > +		rdma->dma_mr = ib_get_dma_mr(rdma->pd, IB_ACCESS_LOCAL_WRITE);
>  > +		if (IS_ERR(rdma->dma_mr))
>  > +			goto error;
>  > +		rdma->lkey = rdma->dma_mr->lkey;
>  > +	} else
>  > +		rdma->lkey = rdma->cm_id->device->local_dma_lkey;
> 
> seems to me this could be written more naturally and avoid the "if
> (0==(test))" construction (which I always have to reread to parse) if
> you reverse the test:
> 
> 	if (devattr.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY) {
> 		rdma->lkey = rdma->cm_id->device->local_dma_lkey;
> 	} else {
> 		rdma->dma_mr = ib_get_dma_mr(rdma->pd, IB_ACCESS_LOCAL_WRITE);
> 		if (IS_ERR(rdma->dma_mr))
> 			goto error;
> 		rdma->lkey = rdma->dma_mr->lkey;
> 	}
> 
agree. thanks.

> The only place I see a send request posted has:
> 
>  > +	wr.opcode = IB_WR_SEND;
> 
> so I guess your protocol is only send/receives (ie no RDMA in the
> pedantic sense, just two-sided operations).  I'm wondering what the
> motivation for all this is: is using send/receive on IB/iWARP a big
> performance win over non-offloaded transport?  Even compared to TCP on a
> NIC with LSO and (software) LRO?
> 

For IB it avoids IPoIB. Which I think is a big win. For iWARP, it's  not 
so clear. There is the potential for RDMA ops in the future.

> Is your protocol documented anywhere?  It seems npfs has support for the
> same transport.
> 

When using SEND/RECV, there isn't a protocol other than 9P itself. Once 
we add RDMA ops, one will be required to exchange RKEY, etc... But maybe 
I don't understand the question?

> Finally, you create your QP with:
> 
>  > +	qp_attr.sq_sig_type = IB_SIGNAL_REQ_WR;
> 
> but you unconditionally do:
> 
>  > +	wr.send_flags = IB_SEND_SIGNALED;
> 
> not a big deal but you could save one line of code and a correspondingly
> tiny amount of .text by just making the send queue signal all requests.
> 

ok.

Thanks for the review,
Tom

>  - R.

next prev parent reply	other threads:[~2008-10-06 18:10 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-02  0:08 [PATCH 00/03] RDMA Transport Support for 9P Tom Tucker
2008-10-02  0:08 ` [PATCH 01/03] 9prdma: " Tom Tucker
2008-10-02  0:08   ` [PATCH 02/03] 9prdma: Makefile change for the RDMA transport Tom Tucker
2008-10-02  0:08     ` [PATCH 03/03] 9prdma: Kconfig changes " Tom Tucker
2008-10-02  5:11   ` [PATCH 01/03] 9prdma: RDMA Transport Support for 9P Roland Dreier
2008-10-06 18:10     ` Tom Tucker [this message]
2008-10-06 22:35       ` Roland Dreier
  -- strict thread matches above, loose matches on Subject: below --
2008-10-06 15:56 [PATCH 00/03] " Tom Tucker
2008-10-06 15:56 ` [PATCH 01/03] 9prdma: " Tom Tucker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48EA549E.1050001@opengridcomputing.com \
    --to=tom@opengridcomputing.com \
    --cc=ericvh@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lionkov@lanl.gov \
    --cc=rdreier@cisco.com \
    --cc=rminnich@dancer.ca.sandia.gov \
    --cc=v9fs-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.