From mboxrd@z Thu Jan 1 00:00:00 1970 From: Leon Romanovsky Subject: Re: [PATCH] RDMA/core: Add wait/retry version of ibnl_unicast Date: Thu, 29 Jun 2017 08:04:36 +0300 Message-ID: <20170629050436.GO1248@mtr-leonro.local> References: <1498658565-3408-1-git-send-email-mustafa.ismail@intel.com> <20170628141211.GA16312@ctung-MOBL3.amr.corp.intel.com> <20170628153639.GF1248@mtr-leonro.local> <20170628203003.GA23300@ctung-MOBL3.amr.corp.intel.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="VJvbhUb+lmZn6RCr" Return-path: Content-Disposition: inline In-Reply-To: <20170628203003.GA23300-TZeIlv3TuzOfrEmaQUPKxl95YUYmaKo1UNDiOz3kqAs@public.gmane.org> Sender: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Chien Tin Tung Cc: Mustafa Ismail , linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org, e1000-rdma-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org, shiraz.saleem-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org List-Id: linux-rdma@vger.kernel.org --VJvbhUb+lmZn6RCr Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jun 28, 2017 at 03:30:03PM -0500, Chien Tin Tung wrote: > On Wed, Jun 28, 2017 at 06:36:39PM +0300, Leon Romanovsky wrote: > > On Wed, Jun 28, 2017 at 09:12:11AM -0500, Chien Tin Tung wrote: > > > On Wed, Jun 28, 2017 at 09:02:45AM -0500, Mustafa Ismail wrote: > > > > Add a wait/retry version of ibnl_unicast, ibnl_unicast_wait, > > > > and modify ibnl_unicast to not wait/retry. This eliminates > > > > the undesirable wait for future users of ibnl_unicast. > > > > > > > > Change Portmapper calls originating from kernel to user-space > > > > to use ibnl_unicast_wait and take advantage of the wait/retry > > > > logic in netlink_unicast. > > > > > > > > Signed-off-by: Mustafa Ismail > > > > Signed-off-by: Chien Tin Tung > > > > --- > > > > drivers/infiniband/core/iwpm_msg.c | 6 +++--- > > > > drivers/infiniband/core/netlink.c | 12 +++++++++++- > > > > include/rdma/rdma_netlink.h | 10 ++++++++++ > > > > 3 files changed, 24 insertions(+), 4 deletions(-) > > > > > > Please apply this patch instead of Leon's patch to revert > > > "IB/core: Add flow control to the portmapper netlink calls". > > > > > > Leon, we can work out names and parameters if this works for you. > > > > Chien, > > > > The names are less my worries with this patch. First of all, it misleads > > by using wait/retry naming, because it blocks and not waits. > > Nope. It does a single shot retry and waits in a waitqueue. > Go look at netlink_unicast and in turn netlink_attachskb. If you still > disagree, please flag specific code where it blocks. I agree, it wouldn't block in your scenario. However will it work in more h= ostile environments? For example, malicious user can open RDMA netlink socket directly (socket(.= =2E.)), set sndtimeo to be MAX_SCHEDULE_TIMEOUT - 1 (LONG_MAX - 1) and send custom netlink messages right to your new _wait function. If I understand correctly =66rom the code, it will add them to waitqueue and won't release skb till the end of processing. Will it cause to mark whole netlink socket as NETLINK_S_CONGESTED? Will other users will be able to progress with their messages or they will need to wait till those _wait calls finish? > > Here are the two functions for your convenience. > > > int netlink_unicast(struct sock *ssk, struct sk_buff *skb, > u32 portid, int nonblock) > { > struct sock *sk; > int err; > long timeo; > > skb =3D netlink_trim(skb, gfp_any()); > > timeo =3D sock_sndtimeo(ssk, nonblock); > retry: > sk =3D netlink_getsockbyportid(ssk, portid); > if (IS_ERR(sk)) { > kfree_skb(skb); > return PTR_ERR(sk); > } > if (netlink_is_kernel(sk)) > return netlink_unicast_kernel(sk, skb, ssk); > > if (sk_filter(sk, skb)) { > err =3D skb->len; > kfree_skb(skb); > sock_put(sk); > return err; > } > > err =3D netlink_attachskb(sk, skb, &timeo, ssk); > if (err =3D=3D 1) > goto retry; > if (err) > return err; > > return netlink_sendskb(sk, skb); > } > > /* > * Attach a skb to a netlink socket. > * The caller must hold a reference to the destination socket. On error, = the > * reference is dropped. The skb is not send to the destination, just all > * all error checks are performed and memory in the queue is reserved. > * Return values: > * < 0: error. skb freed, reference to sock dropped. > * 0: continue > * 1: repeat lookup - reference dropped while waiting for socket memory. > */ > int netlink_attachskb(struct sock *sk, struct sk_buff *skb, > long *timeo, struct sock *ssk) > { > struct netlink_sock *nlk; > > nlk =3D nlk_sk(sk); > > if ((atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf || > test_bit(NETLINK_S_CONGESTED, &nlk->state))) { > DECLARE_WAITQUEUE(wait, current); > if (!*timeo) { > if (!ssk || netlink_is_kernel(ssk)) > netlink_overrun(sk); > sock_put(sk); > kfree_skb(skb); > return -EAGAIN; > } > > __set_current_state(TASK_INTERRUPTIBLE); > add_wait_queue(&nlk->wait, &wait); > > if ((atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf || > test_bit(NETLINK_S_CONGESTED, &nlk->state)) && > !sock_flag(sk, SOCK_DEAD)) > *timeo =3D schedule_timeout(*timeo); > > __set_current_state(TASK_RUNNING); > remove_wait_queue(&nlk->wait, &wait); > sock_put(sk); > > if (signal_pending(current)) { > kfree_skb(skb); > return sock_intr_errno(*timeo); > } > return 1; > } > netlink_skb_set_owner_r(skb, sk); > return 0; > } > > > BTW, _nobody_ is resetting the socket attribute from O_NONBLOCK. > > It is very difficult to understand your argument of "blocking" when you a= re not > sharing the specifics. Please put your finger on it so everyone can unde= rstand > your point. I hope that I succeeded to answer on your questions. > > > The second, I disagree with solution in kernel for user space applicati= on which can't > > handle the netlink errors. > > There is no guarantee delivery nor blocking on send. Like I mentioned ab= ove, > it is a 1 shot retry with a set wait time. The code obviousely handles e= rror > condition as it can happen. So, can you please refresh our memory and explain again what exactly this patch is fixing if user-space handles errors correctly? > > > Chien --VJvbhUb+lmZn6RCr Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEkhr/r4Op1/04yqaB5GN7iDZyWKcFAllUimQACgkQ5GN7iDZy WKe94g/9ELAmJPLerYD0qtChsqtnUM9USU0GCnLdreMfozaeaAJKYpQ9EF6uEGHX UpouITsLqmP24y0Q7bidCq++b156axmUD3sV5Ptu1+VWwaHdoR/71hmTgOkv8kvM BgzIH+sM/han1853W+1OH4trEbc+GHREz2PcNafO4cFMvaB13/U+zWk8pd8wo7DY ikPivZl0yHZs02wGcgLypcbXvwwHXCauPpxaQ4TgmaxYDJlhbtz0S0PRI6ehXag2 lXrIw7a6pm4p6cYl714OIz//I6JxTgMxrbRkaCcA0WEzbvcN5KpHAiJfH1KxbILX fFeuKUtY25zVNYr2MZTJ/uJCWWxVWM8O+DHUXzh1JIfMy6a839io8SkeWpWperRT z5gd2/41nP2OiDlF9MqPNzhOC+ZXbNKQ2RvT9/CDFPkKBZYDGPHw0F8wlXsWFumt T0SlArwFjH+iQncgEwgwLc1b8ZgRNZ+WNODs0UayIRzXN22UIJ5u+xOiRoc7//SW ohFe9imseDjjKHu8O1QCDFEW7A/rw3Mw07fDMWJvkxuT5iAGj7+NT1Nk0ZstWT6H ElY0B4CTm5nfI8buidyyzem7aA88A72GuvKFkGVfc9Q+wsgIAnVBHexztPdWW6CF jSLujkKr2h0O9+S8shoRFWSAGVMTTElcihKXlH90SNqDxCSj1Hc= =+G/6 -----END PGP SIGNATURE----- --VJvbhUb+lmZn6RCr-- -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html