public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 0/2] Get rid of transport layer retry count config parameter
       [not found] ` <1466597161-5242-1-git-send-email-sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
@ 2016-06-22 16:15   ` Christoph Hellwig
       [not found]     ` <20160622161559.GA18361-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2016-06-22 16:15 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Jun 22, 2016 at 03:05:59PM +0300, Sagi Grimberg wrote:
> This parameter was added in order to support a proper timeout for
> error recovery before the spec defined a periodic keep-alive.
> 
> Now that we have periodic keep-alive, we don't need a user configurable
> transport layer retry count, the keep-alive timeout is sufficient,
> transports can retry for as long as they see fit.

Isn't there some IB protocol level rationale for a low retry count
in various fabric setups?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] Get rid of transport layer retry count config parameter
       [not found]     ` <20160622161559.GA18361-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2016-06-22 16:31       ` Sagi Grimberg
  2016-06-22 20:31       ` Jason Gunthorpe
  2016-07-18 15:20       ` Bart Van Assche
  2 siblings, 0 replies; 12+ messages in thread
From: Sagi Grimberg @ 2016-06-22 16:31 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA


>> This parameter was added in order to support a proper timeout for
>> error recovery before the spec defined a periodic keep-alive.
>>
>> Now that we have periodic keep-alive, we don't need a user configurable
>> transport layer retry count, the keep-alive timeout is sufficient,
>> transports can retry for as long as they see fit.
>
> Isn't there some IB protocol level rationale for a low retry count
> in various fabric setups?

None that I know of... The QP retry count determines the time it would
take to fail a send/read/write.. The retry_count value is multiplied
with the packet timeout (which is a result of an IB specific
computation - managed by the CM).

It's useful when one needs to limit the time until a send fails in order
to kick error recovery (useful for srp which doesn't implement periodic
keep-alive), but since nvme does, I don't see the reason why RDMA or any
other transport should expose this configuration as the keep-alive
timeout exists for that.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] Get rid of transport layer retry count config parameter
       [not found]     ` <20160622161559.GA18361-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2016-06-22 16:31       ` Sagi Grimberg
@ 2016-06-22 20:31       ` Jason Gunthorpe
       [not found]         ` <20160622203110.GA20838-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2016-07-18 15:20       ` Bart Van Assche
  2 siblings, 1 reply; 12+ messages in thread
From: Jason Gunthorpe @ 2016-06-22 20:31 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Jun 22, 2016 at 09:15:59AM -0700, Christoph Hellwig wrote:
> On Wed, Jun 22, 2016 at 03:05:59PM +0300, Sagi Grimberg wrote:
> > This parameter was added in order to support a proper timeout for
> > error recovery before the spec defined a periodic keep-alive.
> > 
> > Now that we have periodic keep-alive, we don't need a user configurable
> > transport layer retry count, the keep-alive timeout is sufficient,
> > transports can retry for as long as they see fit.
> 
> Isn't there some IB protocol level rationale for a low retry count
> in various fabric setups?

IIRC the retry count is part of what drives the APM switch over, so
APM configuration should use a lower value.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] Get rid of transport layer retry count config parameter
       [not found]         ` <20160622203110.GA20838-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-06-23  7:09           ` Sagi Grimberg
       [not found]             ` <576B8B30.8080402-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Sagi Grimberg @ 2016-06-23  7:09 UTC (permalink / raw)
  To: Jason Gunthorpe, Christoph Hellwig
  Cc: linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA


>>> This parameter was added in order to support a proper timeout for
>>> error recovery before the spec defined a periodic keep-alive.
>>>
>>> Now that we have periodic keep-alive, we don't need a user configurable
>>> transport layer retry count, the keep-alive timeout is sufficient,
>>> transports can retry for as long as they see fit.
>>
>> Isn't there some IB protocol level rationale for a low retry count
>> in various fabric setups?
>
> IIRC the retry count is part of what drives the APM switch over, so
> APM configuration should use a lower value.

Completely agree Jason. Lowering the retry_count is very useful
for APM (Automatic Path Migration).
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] Get rid of transport layer retry count config parameter
       [not found]             ` <576B8B30.8080402-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
@ 2016-06-24  7:13               ` Christoph Hellwig
       [not found]                 ` <20160624071336.GE4252-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2016-06-24  7:13 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Jason Gunthorpe, Christoph Hellwig,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, Jun 23, 2016 at 10:09:36AM +0300, Sagi Grimberg wrote:
> Completely agree Jason. Lowering the retry_count is very useful
> for APM (Automatic Path Migration).

Does this mean you're retracting the patches?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] Get rid of transport layer retry count config parameter
       [not found]                 ` <20160624071336.GE4252-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2016-06-26 15:48                   ` Sagi Grimberg
  2016-07-17 11:52                   ` Sagi Grimberg
  1 sibling, 0 replies; 12+ messages in thread
From: Sagi Grimberg @ 2016-06-26 15:48 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


>> Completely agree Jason. Lowering the retry_count is very useful
>> for APM (Automatic Path Migration).
>
> Does this mean you're retracting the patches?

I'm not, were not using APM anywhere in nvme-rdma. multipathing
is done at a higher level than the transport.

Do you see a reason to keep this? I'm not too enthusiast with leaving
configs that aren't absolutely needed. As mentioned this config was
added to add a fast-fail functionality before we defined the periodic 
keep-alive...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] Get rid of transport layer retry count config parameter
       [not found]                 ` <20160624071336.GE4252-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2016-06-26 15:48                   ` Sagi Grimberg
@ 2016-07-17 11:52                   ` Sagi Grimberg
       [not found]                     ` <578B718D.2090909-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
  1 sibling, 1 reply; 12+ messages in thread
From: Sagi Grimberg @ 2016-07-17 11:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


>> Completely agree Jason. Lowering the retry_count is very useful
>> for APM (Automatic Path Migration).
>
> Does this mean you're retracting the patches?

No, we never use APM in nvme-rdma, so I don't see a good reason
why we should keep it around....

Can I get a review on this btw?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] Get rid of transport layer retry count config parameter
       [not found]                     ` <578B718D.2090909-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
@ 2016-07-18  4:09                       ` Christoph Hellwig
       [not found]                         ` <20160718040901.GA2521-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2016-07-18  4:09 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Jason Gunthorpe,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Sun, Jul 17, 2016 at 02:52:45PM +0300, Sagi Grimberg wrote:
> Can I get a review on this btw?

Jens already merged that patch after I pinged him last week.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] Get rid of transport layer retry count config parameter
       [not found]                         ` <20160718040901.GA2521-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2016-07-18  4:09                           ` Christoph Hellwig
       [not found]                             ` <20160718040946.GB2521-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2016-07-18  4:09 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Jason Gunthorpe,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Sun, Jul 17, 2016 at 09:09:01PM -0700, Christoph Hellwig wrote:
> On Sun, Jul 17, 2016 at 02:52:45PM +0300, Sagi Grimberg wrote:
> > Can I get a review on this btw?
> 
> Jens already merged that patch after I pinged him last week.

s/patch/series/.  Time to havea coffee..
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] Get rid of transport layer retry count config parameter
       [not found]                             ` <20160718040946.GB2521-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2016-07-18  8:01                               ` Sagi Grimberg
  0 siblings, 0 replies; 12+ messages in thread
From: Sagi Grimberg @ 2016-07-18  8:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


>> Jens already merged that patch after I pinged him last week.
>
> s/patch/series/.  Time to havea coffee..

Thanks Christoph and Jens :)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] Get rid of transport layer retry count config parameter
       [not found]     ` <20160622161559.GA18361-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2016-06-22 16:31       ` Sagi Grimberg
  2016-06-22 20:31       ` Jason Gunthorpe
@ 2016-07-18 15:20       ` Bart Van Assche
       [not found]         ` <12b64608-1d42-ffe6-c11a-58139cbabd3a-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  2 siblings, 1 reply; 12+ messages in thread
From: Bart Van Assche @ 2016-07-18 15:20 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg
  Cc: linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 06/22/2016 09:15 AM, Christoph Hellwig wrote:
> On Wed, Jun 22, 2016 at 03:05:59PM +0300, Sagi Grimberg wrote:
>> This parameter was added in order to support a proper timeout for
>> error recovery before the spec defined a periodic keep-alive.
>>
>> Now that we have periodic keep-alive, we don't need a user configurable
>> transport layer retry count, the keep-alive timeout is sufficient,
>> transports can retry for as long as they see fit.
>
> Isn't there some IB protocol level rationale for a low retry count
> in various fabric setups?

The IB spec defines an end-to-end credit mechanism for RC connections. 
So if the transport layer is reliable (InfiniBand, RoCE with DCB 
enabled) setting the retry count high enough is only needed to avoid 
connection shutdown due to brief cable disconnect/reconnect events.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] Get rid of transport layer retry count config parameter
       [not found]         ` <12b64608-1d42-ffe6-c11a-58139cbabd3a-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2016-07-20  8:42           ` Sagi Grimberg
  0 siblings, 0 replies; 12+ messages in thread
From: Sagi Grimberg @ 2016-07-20  8:42 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig
  Cc: linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA


> The IB spec defines an end-to-end credit mechanism for RC connections.
> So if the transport layer is reliable (InfiniBand, RoCE with DCB
> enabled) setting the retry count high enough is only needed to avoid
> connection shutdown due to brief cable disconnect/reconnect events.

Right, this is why I think the driver can use whatever it sees fit (we
have a keep-alive mechanism for fast-fail functionality).
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-07-20  8:42 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1466597161-5242-1-git-send-email-sagi@grimberg.me>
     [not found] ` <1466597161-5242-1-git-send-email-sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2016-06-22 16:15   ` [PATCH 0/2] Get rid of transport layer retry count config parameter Christoph Hellwig
     [not found]     ` <20160622161559.GA18361-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-06-22 16:31       ` Sagi Grimberg
2016-06-22 20:31       ` Jason Gunthorpe
     [not found]         ` <20160622203110.GA20838-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-06-23  7:09           ` Sagi Grimberg
     [not found]             ` <576B8B30.8080402-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2016-06-24  7:13               ` Christoph Hellwig
     [not found]                 ` <20160624071336.GE4252-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-06-26 15:48                   ` Sagi Grimberg
2016-07-17 11:52                   ` Sagi Grimberg
     [not found]                     ` <578B718D.2090909-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2016-07-18  4:09                       ` Christoph Hellwig
     [not found]                         ` <20160718040901.GA2521-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-07-18  4:09                           ` Christoph Hellwig
     [not found]                             ` <20160718040946.GB2521-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-07-18  8:01                               ` Sagi Grimberg
2016-07-18 15:20       ` Bart Van Assche
     [not found]         ` <12b64608-1d42-ffe6-c11a-58139cbabd3a-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2016-07-20  8:42           ` Sagi Grimberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox