Re: [PATCH] RDS: sync congestion map updating

linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Wengang Wang <wen.gang.wang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
To: santosh shilimkar
	<santosh.shilimkar-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: leon-2ukJVAZIZ/Y@public.gmane.org
Subject: Re: [PATCH] RDS: sync congestion map updating
Date: Thu, 31 Mar 2016 10:59:10 +0800	[thread overview]
Message-ID: <56FC927E.9090404@oracle.com> (raw)
In-Reply-To: <56FC82B7.3070504-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>



在 2016年03月31日 09:51, Wengang Wang 写道:
>
>
> 在 2016年03月31日 01:16, santosh shilimkar 写道:
>> Hi Wengang,
>>
>> On 3/30/2016 9:19 AM, Leon Romanovsky wrote:
>>> On Wed, Mar 30, 2016 at 05:08:22PM +0800, Wengang Wang wrote:
>>>> Problem is found that some among a lot of parallel RDS 
>>>> communications hang.
>>>> In my test ten or so among 33 communications hang. The send 
>>>> requests got
>>>> -ENOBUF error meaning the peer socket (port) is congested. But 
>>>> meanwhile,
>>>> peer socket (port) is not congested.
>>>>
>>>> The congestion map updating can happen in two paths: one is in 
>>>> rds_recvmsg path
>>>> and the other is when it receives packets from the hardware. There 
>>>> is no
>>>> synchronization when updating the congestion map. So a bit 
>>>> operation (clearing)
>>>> in the rds_recvmsg path can be skipped by another bit operation 
>>>> (setting) in
>>>> hardware packet receving path.
>>>>

To be more detailed.  Here, the two paths (user calls recvmsg and 
hardware receives data) are for different rds socks. thus the 
rds_sock->rs_recv_lock is not helpful to sync the updating on congestion 
map.

thanks,
wengang
>>>> Fix is to add a spin lock per congestion map to sync the update on it.
>>>> No performance drop found during the test for the fix.
>>>
>>> I assume that this change fixed your issue, however it looks suspicious
>>> that performance wasn't change.
>>>
>> First of all thanks for finding the issue and posting patch
>> for it. I do agree with Leon on performance comment.
>> We shouldn't need locks for map updates.
>>
> Here is the performance data I collected yesterday.
> Settings:
> net.core.rmem_default = 4194304
> net.core.wmem_default = 262144
> net.core.rmem_max = 4194304
> net.core.wmem_max = 2097152
>
> test case:  rds-stress -s 192.168.111.16 -q 1m -d 10 -T 300 -t 10
> With 1M size sends, the 10 pending send request is enough to trigger 
> the congestion on receiver side. And the test last 5 mins.
>
> result is like this:
> without patch:
> 10   2231   2355 4697759.63       0.00       0.00  473.38 19123.89 
> -1.00  (average)
> receiver
>  10   2356   2231 4698350.06       0.00       0.00  486.28 18537.23 
> -1.00  (average)
>
> with patch applied:
> sender
> 10   2230   2396 47x.53       0.00       0.00  475.87 31954.35 -1.00  
> (average)
> receiver
> 10   2396   2230 4738051.76       0.00       0.00  480.85 18408.13 
> -1.00  (average)
>
> So I don't see performance drops. On a previous test, the test result 
> is reverted that is it's faster when patch not applied, but the 
> numbers is till 47xxxxx VS 46xxxxx.  So I don't have a very stable 
> test result. But in average, no obvious performance drop.
>
> Let me try to explain from theory:
> Firstly, No matter the rds_recvmsg path or the hardware receiving data 
> path, we have rds_sock->rs_recv_lock (this is not enough to fix our 
> issue here since there could be many different rds_socks) locked very 
> near before we lock the congestion map.  So the performance drop on 
> CPU cache refilling is small.
> Secondly, though the problem exist,  the malformed map may be not 
> happening that frequent especially for this test case, 10 parallel 
> communication.
>
>> Moreover the parallel receive path on which this patch
>> is based of doesn't exist in upstream code. I have kept
>> that out so far because of similar issue like one you
>> encountered.
> But I don't see how rds_recvmsg path is different from UEK kernels. 
> Can you explain more here or offline?
>
>>
>> Anyways lets discuss offline about the fix even for the
>> downstream kernel. I suspect we can address it without locks.
>>
> If in normal use we have no performace issue (and before we found 
> import use case that would hit), I think locking is fine.
> Well, what ideas do you have to prevent using locks? After all we are 
> updating a 8KB bitmap, not a single uint64 or less length variable. No 
> matter we use lock or not, we need to make sure the bits to update 
> can't be cached on different CPUs.
>
> thanks,
> wengang
>
>> Reagrds,
>> Santosh
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2016-03-31  2:59 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-30  9:08 [PATCH] RDS: sync congestion map updating Wengang Wang
     [not found] ` <1459328902-31968-1-git-send-email-wen.gang.wang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-03-30 16:19   ` Leon Romanovsky
     [not found]     ` <20160330161952.GA2670-2ukJVAZIZ/Y@public.gmane.org>
2016-03-30 17:16       ` santosh shilimkar
     [not found]         ` <56FC09D6.7090602-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-03-31  1:51           ` Wengang Wang
     [not found]             ` <56FC82B7.3070504-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-03-31  2:59               ` Wengang Wang [this message]
     [not found]                 ` <56FC927E.9090404-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-04-01 19:47                   ` santosh shilimkar
2016-04-02  1:14                     ` Leon Romanovsky
     [not found]                       ` <20160402011459.GC8565-2ukJVAZIZ/Y@public.gmane.org>
2016-04-02  4:30                         ` santosh.shilimkar-QHcLZuEGTsvQT0dZR+AlfA
2016-03-31  1:24       ` Wengang Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56FC927E.9090404@oracle.com \
    --to=wen.gang.wang-qhclzuegtsvqt0dzr+alfa@public.gmane.org \
    --cc=leon-2ukJVAZIZ/Y@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=santosh.shilimkar-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).