netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Network Development <netdev@vger.kernel.org>,
	David Miller <davem@davemloft.net>,
	rds-devel@oss.oracle.com,
	Santosh Shilimkar <santosh.shilimkar@oracle.com>
Subject: Re: [PATCH net-next] RDS: deliver zerocopy completion notification with data as an optimization
Date: Wed, 21 Feb 2018 17:14:28 -0500	[thread overview]
Message-ID: <20180221221428.GG15244@oracle.com> (raw)
In-Reply-To: <CAF=yD-LecYGeH1Wg3kL-pDDKaFX0pbXzr_dfVPrsrBL3uL8VWw@mail.gmail.com>

On (02/21/18 16:54), Willem de Bruijn wrote:
> 
> I'd put this optimization behind a socket option.

Yes, that thought occurred to me as well- I think RDS applications
are unlikely to use the error_queue path because, as I mentioned
before, these are heavily request-response based, so you're 
going to be getting something back for each sendmsg anyway.

So if I had a sockopt, it would be something that would be
"piggyback completion" most (all?) of the time anyway, but I suppose
making it explicit is useful to have.

> Either that, or always store cookies on an RDS specific queue, return as recv
> cmsg and then remove the alternative error queue path completely.

I think the error queue path is good to have, if only to be aligned
with other socket families.

> > +       spin_lock_irqsave(&q->lock, flags);
> 
> This seems expensive on every recvmsg. Even if zerocopy is not enabled.

noted, I'll predicate it on both zcopy and the sockopt suggestion.

> > +       if (skb_queue_empty(q)) {
> > +               spin_unlock_irqrestore(&q->lock, flags);
> > +               return 0;
> > +       }
> > +       skb_queue_walk_safe(q, skb, tmp) {
> 
> This too. If anything, just peek at the queue head and skip otherwise.
> Anything else will cause an error, which blocks regular read and write
> on the socket.

but that would be technically incorrect, because you could have a 
mix/match of the zcopy notification with other sk_error_queue messages
(given that rds_rm_zerocopy_callback looks at the tail, and
creates a new skb if the tail is not a SO_EE_ORIGIN_ZCOOKIE).

Of course, its true that *today* the only thing in the rds socket
error queue is the cookie list, so this queue walk is a bit of overkill..

but maybe I am missing something you are concerned about? The queue walk is
intended to pull out the first SO_EE_ORIGIN_ZCOOKIE (if it exists)
and return quietly otherwise (leaving err_queue unchanged)- where
do you see the blocking error?


> > +               if (list_empty(&rs->rs_recv_queue) && nonblock) {
> > +                       ncookies = rds_recvmsg_zcookie(rs, msg);
> > +                       if (ncookies) {
> > +                               ret = 0;
> 
> This probably changes semantics for MSG_DONTWAIT. Will it ever return 0 now?

The only time MSG_DONTWAIT was previously returning 0 for PF_RDS
was when there was a congestion notification or rdma completion
i.e., the checks for rs_notify_queue and rs_cong_notify earlier
in the function, and in those 2 cases it was not returning data
(i.e. ret was always 0). In all other cases  rds_recvmsg would
always return either
- ret == -1, with errno set to EAGAIN or ETIMEDOUT, depending on the value
  of MSG_DONTWAIT, or,
- ret > 0 with data bytes.

That behavior (as well existing behavior for congestion notification
and rdma completion) has not changed. The only thing that changed is that
if there is no data to pass up, the ret will be 0, errno will be 0,
and the ancillary data may have completion cookies.

--Sowmini
  

  reply	other threads:[~2018-02-21 22:15 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-21 20:19 [PATCH net-next] RDS: deliver zerocopy completion notification with data as an optimization Sowmini Varadhan
2018-02-21 21:04 ` Santosh Shilimkar
2018-02-21 21:54 ` Willem de Bruijn
2018-02-21 22:14   ` Sowmini Varadhan [this message]
2018-02-21 22:50     ` Willem de Bruijn
2018-02-21 23:03       ` Sowmini Varadhan
2018-02-21 23:45         ` Willem de Bruijn
2018-02-22  0:26           ` Sowmini Varadhan
2018-02-22  0:39             ` Willem de Bruijn
2018-02-22 13:36               ` Sowmini Varadhan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180221221428.GG15244@oracle.com \
    --to=sowmini.varadhan@oracle.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=rds-devel@oss.oracle.com \
    --cc=santosh.shilimkar@oracle.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).