netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH for-next v5 0/7] RDMA/rxe: Make multicast work
       [not found] <20231205002322.10143-1-rpearsonhpe@gmail.com>
@ 2023-12-05  5:50 ` Zhu Yanjun
  2023-12-07 16:15   ` David Ahern
       [not found] ` <20231205002322.10143-2-rpearsonhpe@gmail.com>
       [not found] ` <20231205002322.10143-3-rpearsonhpe@gmail.com>
  2 siblings, 1 reply; 4+ messages in thread
From: Zhu Yanjun @ 2023-12-05  5:50 UTC (permalink / raw)
  To: Bob Pearson, jgg, linux-rdma, dsahern, davem@davemloft.net,
	netdev@vger.kernel.org

Add  David S. Miller and  David Ahern.

They are the maintainers in netdev and very familiar with mcast.

Zhu Yanjun

在 2023/12/5 8:23, Bob Pearson 写道:
> After developing a test program which exercises node to node
> testing of RoCE multicast it became clear that there are a
> number of issues with the current rdma_rxe multicast implementation.
>
> The issues seen include:
> 	- There is no support for IPV4 multicast addresses.
> 	- Once a multicast MAC is added it is not removed.
> 	- Multicast packets are sent with the wrong QP number.
> 	- Multicast IP addresses are not created and without
> 	  them no multicast packets received on the interface
> 	  are delivered to the rdma_rxe driver.
> 	- The implementation in rxe_mcast.c is potentially
> 	  racy if multiple simultaneous attach/detach operations
> 	  are issued.
>
> This patch set fixes these issues.
> ---
> v5:
>    Missed previous fix in error path. Add sock lock around
>    ipv6_sock_mm_drop() in rxe_mcast_add6().
> v4:
>    Corrected a lockdep bug reported by Zhu Yanjun.
> v3:
>    Removed rxe_loop_and_send(). It turns out it is not needed.
>    Added module parameters to set mcast limits to small values when
>    driver is loaded to enable mcast limit testing.
>    Rebased to current for-next branch.
> v2:
>    Respond to comments by Zhu.
>    Added more Fixes lines.
>    Added some more explanation in the commit messages.
>    Fixed an error in rxe_lookup_mcg. Should have checked
> 	the return from rxe_get_mcg().
>
> Bob Pearson (7):
>    RDMA/rxe: Cleanup rxe_ah/av_chk_attr
>    RDMA/rxe: Fix sending of mcast packets
>    RDMA/rxe: Register IP mcast address
>    RDMA/rxe: Let rxe_lookup_mcg use rcu_read_lock
>    RDMA/rxe: Split multicast lock
>    RDMA/rxe: Cleanup mcg lifetime
>    RDMA/rxe: Add module parameters for mcast limits
>
>   drivers/infiniband/sw/rxe/Makefile     |   3 +-
>   drivers/infiniband/sw/rxe/rxe.c        |   8 +-
>   drivers/infiniband/sw/rxe/rxe_av.c     |  50 +--
>   drivers/infiniband/sw/rxe/rxe_loc.h    |   6 +-
>   drivers/infiniband/sw/rxe/rxe_mcast.c  | 526 +++++++++++--------------
>   drivers/infiniband/sw/rxe/rxe_net.c    |   6 +-
>   drivers/infiniband/sw/rxe/rxe_net.h    |   1 +
>   drivers/infiniband/sw/rxe/rxe_opcode.h |   2 +-
>   drivers/infiniband/sw/rxe/rxe_param.c  |  23 ++
>   drivers/infiniband/sw/rxe/rxe_param.h  |   4 +
>   drivers/infiniband/sw/rxe/rxe_qp.c     |   4 +-
>   drivers/infiniband/sw/rxe/rxe_recv.c   |  11 +-
>   drivers/infiniband/sw/rxe/rxe_req.c    |  11 +-
>   drivers/infiniband/sw/rxe/rxe_verbs.c  |   5 +-
>   drivers/infiniband/sw/rxe/rxe_verbs.h  |   5 +-
>   15 files changed, 305 insertions(+), 360 deletions(-)
>   create mode 100644 drivers/infiniband/sw/rxe/rxe_param.c
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH for-next v5 1/7] RDMA/rxe: Cleanup rxe_ah/av_chk_attr
       [not found] ` <20231205002322.10143-2-rpearsonhpe@gmail.com>
@ 2023-12-05  5:52   ` Zhu Yanjun
  0 siblings, 0 replies; 4+ messages in thread
From: Zhu Yanjun @ 2023-12-05  5:52 UTC (permalink / raw)
  To: Bob Pearson, jgg, linux-rdma, dsahern, davem@davemloft.net,
	netdev@vger.kernel.org

Add  David S. Miller and  David Ahern.

They are the maintainers in netdev and very familiar with mcast.

Zhu Yanjun

在 2023/12/5 8:23, Bob Pearson 写道:
> Replace rxe_ah_chk_attr() and rxe_av_chk_attr() by a single
> routine rxe_chk_ah_attr().
>
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> ---
>   drivers/infiniband/sw/rxe/rxe_av.c    | 43 ++++-----------------------
>   drivers/infiniband/sw/rxe/rxe_loc.h   |  3 +-
>   drivers/infiniband/sw/rxe/rxe_qp.c    |  4 +--
>   drivers/infiniband/sw/rxe/rxe_verbs.c |  5 ++--
>   4 files changed, 12 insertions(+), 43 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_av.c b/drivers/infiniband/sw/rxe/rxe_av.c
> index 889d7adbd455..4ac17b8def28 100644
> --- a/drivers/infiniband/sw/rxe/rxe_av.c
> +++ b/drivers/infiniband/sw/rxe/rxe_av.c
> @@ -14,45 +14,24 @@ void rxe_init_av(struct rdma_ah_attr *attr, struct rxe_av *av)
>   	memcpy(av->dmac, attr->roce.dmac, ETH_ALEN);
>   }
>   
> -static int chk_attr(void *obj, struct rdma_ah_attr *attr, bool obj_is_ah)
> +int rxe_chk_ah_attr(struct rxe_dev *rxe, struct rdma_ah_attr *attr)
>   {
>   	const struct ib_global_route *grh = rdma_ah_read_grh(attr);
> -	struct rxe_port *port;
> -	struct rxe_dev *rxe;
> -	struct rxe_qp *qp;
> -	struct rxe_ah *ah;
> +	struct rxe_port *port = &rxe->port;
>   	int type;
>   
> -	if (obj_is_ah) {
> -		ah = obj;
> -		rxe = to_rdev(ah->ibah.device);
> -	} else {
> -		qp = obj;
> -		rxe = to_rdev(qp->ibqp.device);
> -	}
> -
> -	port = &rxe->port;
> -
>   	if (rdma_ah_get_ah_flags(attr) & IB_AH_GRH) {
>   		if (grh->sgid_index > port->attr.gid_tbl_len) {
> -			if (obj_is_ah)
> -				rxe_dbg_ah(ah, "invalid sgid index = %d\n",
> -						grh->sgid_index);
> -			else
> -				rxe_dbg_qp(qp, "invalid sgid index = %d\n",
> -						grh->sgid_index);
> +			rxe_dbg_dev(rxe, "invalid sgid index = %d\n",
> +					grh->sgid_index);
>   			return -EINVAL;
>   		}
>   
>   		type = rdma_gid_attr_network_type(grh->sgid_attr);
>   		if (type < RDMA_NETWORK_IPV4 ||
>   		    type > RDMA_NETWORK_IPV6) {
> -			if (obj_is_ah)
> -				rxe_dbg_ah(ah, "invalid network type for rdma_rxe = %d\n",
> -						type);
> -			else
> -				rxe_dbg_qp(qp, "invalid network type for rdma_rxe = %d\n",
> -						type);
> +			rxe_dbg_dev(rxe, "invalid network type for rdma_rxe = %d\n",
> +					type);
>   			return -EINVAL;
>   		}
>   	}
> @@ -60,16 +39,6 @@ static int chk_attr(void *obj, struct rdma_ah_attr *attr, bool obj_is_ah)
>   	return 0;
>   }
>   
> -int rxe_av_chk_attr(struct rxe_qp *qp, struct rdma_ah_attr *attr)
> -{
> -	return chk_attr(qp, attr, false);
> -}
> -
> -int rxe_ah_chk_attr(struct rxe_ah *ah, struct rdma_ah_attr *attr)
> -{
> -	return chk_attr(ah, attr, true);
> -}
> -
>   void rxe_av_from_attr(u8 port_num, struct rxe_av *av,
>   		     struct rdma_ah_attr *attr)
>   {
> diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
> index 4d2a8ef52c85..3d2504a0ae56 100644
> --- a/drivers/infiniband/sw/rxe/rxe_loc.h
> +++ b/drivers/infiniband/sw/rxe/rxe_loc.h
> @@ -9,8 +9,7 @@
>   
>   /* rxe_av.c */
>   void rxe_init_av(struct rdma_ah_attr *attr, struct rxe_av *av);
> -int rxe_av_chk_attr(struct rxe_qp *qp, struct rdma_ah_attr *attr);
> -int rxe_ah_chk_attr(struct rxe_ah *ah, struct rdma_ah_attr *attr);
> +int rxe_chk_ah_attr(struct rxe_dev *rxe, struct rdma_ah_attr *attr);
>   void rxe_av_from_attr(u8 port_num, struct rxe_av *av,
>   		     struct rdma_ah_attr *attr);
>   void rxe_av_to_attr(struct rxe_av *av, struct rdma_ah_attr *attr);
> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
> index 28e379c108bc..c28005db032d 100644
> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
> @@ -456,11 +456,11 @@ int rxe_qp_chk_attr(struct rxe_dev *rxe, struct rxe_qp *qp,
>   			goto err1;
>   	}
>   
> -	if (mask & IB_QP_AV && rxe_av_chk_attr(qp, &attr->ah_attr))
> +	if (mask & IB_QP_AV && rxe_chk_ah_attr(rxe, &attr->ah_attr))
>   		goto err1;
>   
>   	if (mask & IB_QP_ALT_PATH) {
> -		if (rxe_av_chk_attr(qp, &attr->alt_ah_attr))
> +		if (rxe_chk_ah_attr(rxe, &attr->alt_ah_attr))
>   			goto err1;
>   		if (!rdma_is_port_valid(&rxe->ib_dev, attr->alt_port_num))  {
>   			rxe_dbg_qp(qp, "invalid alt port %d\n", attr->alt_port_num);
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
> index 48f86839d36a..6706d540f1f6 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.c
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
> @@ -286,7 +286,7 @@ static int rxe_create_ah(struct ib_ah *ibah,
>   	/* create index > 0 */
>   	ah->ah_num = ah->elem.index;
>   
> -	err = rxe_ah_chk_attr(ah, init_attr->ah_attr);
> +	err = rxe_chk_ah_attr(rxe, init_attr->ah_attr);
>   	if (err) {
>   		rxe_dbg_ah(ah, "bad attr");
>   		goto err_cleanup;
> @@ -322,10 +322,11 @@ static int rxe_create_ah(struct ib_ah *ibah,
>   
>   static int rxe_modify_ah(struct ib_ah *ibah, struct rdma_ah_attr *attr)
>   {
> +	struct rxe_dev *rxe = to_rdev(ibah->device);
>   	struct rxe_ah *ah = to_rah(ibah);
>   	int err;
>   
> -	err = rxe_ah_chk_attr(ah, attr);
> +	err = rxe_chk_ah_attr(rxe, attr);
>   	if (err) {
>   		rxe_dbg_ah(ah, "bad attr");
>   		goto err_out;

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH for-next v5 2/7] RDMA/rxe: Fix sending of mcast packets
       [not found] ` <20231205002322.10143-3-rpearsonhpe@gmail.com>
@ 2023-12-05  5:53   ` Zhu Yanjun
  0 siblings, 0 replies; 4+ messages in thread
From: Zhu Yanjun @ 2023-12-05  5:53 UTC (permalink / raw)
  To: Bob Pearson, jgg, linux-rdma, dsahern, davem@davemloft.net,
	netdev@vger.kernel.org

Add  David S. Miller and  David Ahern.

They are the maintainers in netdev and very familiar with mcast.

Zhu Yanjun

在 2023/12/5 8:23, Bob Pearson 写道:
> Currently the rdma_rxe driver does not send mcast packets correctly.
> It uses the wrong qp number for the packets.
>
> Add a mask bit to indicate that a multicast packet has been locally
> sent and use to set the correct qpn for multicast packets and
> identify mcast packets when sending.
>
> Fixes: 8700e3e7c485 ("Soft RoCE driver")
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> ---
>   drivers/infiniband/sw/rxe/rxe_av.c     |  7 +++++++
>   drivers/infiniband/sw/rxe/rxe_loc.h    |  1 +
>   drivers/infiniband/sw/rxe/rxe_net.c    |  4 +++-
>   drivers/infiniband/sw/rxe/rxe_opcode.h |  2 +-
>   drivers/infiniband/sw/rxe/rxe_recv.c   |  4 ++++
>   drivers/infiniband/sw/rxe/rxe_req.c    | 11 +++++++++--
>   6 files changed, 25 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_av.c b/drivers/infiniband/sw/rxe/rxe_av.c
> index 4ac17b8def28..022173eb5d75 100644
> --- a/drivers/infiniband/sw/rxe/rxe_av.c
> +++ b/drivers/infiniband/sw/rxe/rxe_av.c
> @@ -7,6 +7,13 @@
>   #include "rxe.h"
>   #include "rxe_loc.h"
>   
> +bool rxe_is_mcast_av(struct rxe_av *av)
> +{
> +	struct in6_addr *daddr = (struct in6_addr *)av->grh.dgid.raw;
> +
> +	return rdma_is_multicast_addr(daddr);
> +}
> +
>   void rxe_init_av(struct rdma_ah_attr *attr, struct rxe_av *av)
>   {
>   	rxe_av_from_attr(rdma_ah_get_port_num(attr), av, attr);
> diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
> index 3d2504a0ae56..62b2b25903fc 100644
> --- a/drivers/infiniband/sw/rxe/rxe_loc.h
> +++ b/drivers/infiniband/sw/rxe/rxe_loc.h
> @@ -8,6 +8,7 @@
>   #define RXE_LOC_H
>   
>   /* rxe_av.c */
> +bool rxe_is_mcast_av(struct rxe_av *av);
>   void rxe_init_av(struct rdma_ah_attr *attr, struct rxe_av *av);
>   int rxe_chk_ah_attr(struct rxe_dev *rxe, struct rdma_ah_attr *attr);
>   void rxe_av_from_attr(u8 port_num, struct rxe_av *av,
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> index cd59666158b1..58c3f3759bf0 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -431,7 +431,9 @@ int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
>   
>   	rxe_icrc_generate(skb, pkt);
>   
> -	if (pkt->mask & RXE_LOOPBACK_MASK)
> +	if (pkt->mask & RXE_MCAST_MASK)
> +		err = rxe_send(skb, pkt);
> +	else if (pkt->mask & RXE_LOOPBACK_MASK)
>   		err = rxe_loopback(skb, pkt);
>   	else
>   		err = rxe_send(skb, pkt);
> diff --git a/drivers/infiniband/sw/rxe/rxe_opcode.h b/drivers/infiniband/sw/rxe/rxe_opcode.h
> index 5686b691d6b8..c4cf672ea26d 100644
> --- a/drivers/infiniband/sw/rxe/rxe_opcode.h
> +++ b/drivers/infiniband/sw/rxe/rxe_opcode.h
> @@ -85,7 +85,7 @@ enum rxe_hdr_mask {
>   	RXE_END_MASK		= BIT(NUM_HDR_TYPES + 11),
>   
>   	RXE_LOOPBACK_MASK	= BIT(NUM_HDR_TYPES + 12),
> -
> +	RXE_MCAST_MASK		= BIT(NUM_HDR_TYPES + 13),
>   	RXE_ATOMIC_WRITE_MASK   = BIT(NUM_HDR_TYPES + 14),
>   
>   	RXE_READ_OR_ATOMIC_MASK	= (RXE_READ_MASK | RXE_ATOMIC_MASK),
> diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
> index 5861e4244049..7153de0799fc 100644
> --- a/drivers/infiniband/sw/rxe/rxe_recv.c
> +++ b/drivers/infiniband/sw/rxe/rxe_recv.c
> @@ -217,6 +217,10 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
>   	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
>   		qp = mca->qp;
>   
> +		/* don't reply packet to sender if locally sent */
> +		if (pkt->mask & RXE_MCAST_MASK && qp_num(qp) == deth_sqp(pkt))
> +			continue;
> +
>   		/* validate qp for incoming packet */
>   		err = check_type_state(rxe, pkt, qp);
>   		if (err)
> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
> index d8c41fd626a9..599bec88cb54 100644
> --- a/drivers/infiniband/sw/rxe/rxe_req.c
> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
> @@ -442,8 +442,12 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp,
>   			(pkt->mask & (RXE_WRITE_MASK | RXE_IMMDT_MASK)) ==
>   			(RXE_WRITE_MASK | RXE_IMMDT_MASK));
>   
> -	qp_num = (pkt->mask & RXE_DETH_MASK) ? ibwr->wr.ud.remote_qpn :
> -					 qp->attr.dest_qp_num;
> +	if (pkt->mask & RXE_MCAST_MASK)
> +		qp_num = IB_MULTICAST_QPN;
> +	else if (pkt->mask & RXE_DETH_MASK)
> +		qp_num = ibwr->wr.ud.remote_qpn;
> +	else
> +		qp_num = qp->attr.dest_qp_num;
>   
>   	ack_req = ((pkt->mask & RXE_END_MASK) ||
>   		(qp->req.noack_pkts++ > RXE_MAX_PKT_PER_ACK));
> @@ -809,6 +813,9 @@ int rxe_requester(struct rxe_qp *qp)
>   		goto err;
>   	}
>   
> +	if (rxe_is_mcast_av(av))
> +		pkt.mask |= RXE_MCAST_MASK;
> +
>   	skb = init_req_packet(qp, av, wqe, opcode, payload, &pkt);
>   	if (unlikely(!skb)) {
>   		rxe_dbg_qp(qp, "Failed allocating skb\n");

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH for-next v5 0/7] RDMA/rxe: Make multicast work
  2023-12-05  5:50 ` [PATCH for-next v5 0/7] RDMA/rxe: Make multicast work Zhu Yanjun
@ 2023-12-07 16:15   ` David Ahern
  0 siblings, 0 replies; 4+ messages in thread
From: David Ahern @ 2023-12-07 16:15 UTC (permalink / raw)
  To: Zhu Yanjun, Bob Pearson, jgg, linux-rdma, davem@davemloft.net,
	netdev@vger.kernel.org

On 12/4/23 10:50 PM, Zhu Yanjun wrote:
> Add  David S. Miller and  David Ahern.
> 
> They are the maintainers in netdev and very familiar with mcast.
> 

I am nowhere close to being an expert on multicast, but many members of
the netdev community are. This set seems to be refactoring rxe code vs
new use of Linux networking stack. If there is something specific,
please Cc netdev on the entire patch set with a request in the cover
letter for specific reviews of individual patches.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-12-07 16:15 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20231205002322.10143-1-rpearsonhpe@gmail.com>
2023-12-05  5:50 ` [PATCH for-next v5 0/7] RDMA/rxe: Make multicast work Zhu Yanjun
2023-12-07 16:15   ` David Ahern
     [not found] ` <20231205002322.10143-2-rpearsonhpe@gmail.com>
2023-12-05  5:52   ` [PATCH for-next v5 1/7] RDMA/rxe: Cleanup rxe_ah/av_chk_attr Zhu Yanjun
     [not found] ` <20231205002322.10143-3-rpearsonhpe@gmail.com>
2023-12-05  5:53   ` [PATCH for-next v5 2/7] RDMA/rxe: Fix sending of mcast packets Zhu Yanjun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).