public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
To: David Dillow <dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org>
Cc: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>,
	Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Vu Pham <vuhuong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	Sebastian Riemer
	<sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>,
	linux-rdma <linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH v2 14/15] IB/srp: Make transport layer retry count configurable
Date: Tue, 2 Jul 2013 13:18:42 -0600	[thread overview]
Message-ID: <20130702191842.GD14625@obsidianresearch.com> (raw)
In-Reply-To: <1372677965.12468.57.camel-a7a0dvSY7KqLUyTwlgNVppKKF0rrzTr+@public.gmane.org>

On Mon, Jul 01, 2013 at 07:26:05AM -0400, David Dillow wrote:

> > The InfiniBand specification mentions the following about differential 
> > receiver inputs (C6-11.2.1): "A BER of 10^-12 shall be achieved when
> > connected to the worst case transmitter through any compliant channel".

This test condition is under 'stressed conditions' for the higher
rates, which doesn't directly translate into an error every 18
seconds under normal conditions.

A properly functioning IB link should not be seeing link errors with
any frequency (eg we expect/observe no errors on single links over
month long periods).

Naturally, a large number of IB links, will on aggregate, have error
rates significantly higher..

.. and this whole area is very challenging and installations with
defective cables/defective end points/non-compliance/etc are not
uncommon, so some sites see higher error rates. :(

However, IIRC, one large installation I know of, had the achieved
target of nearly no errors on any link during typical operation.

In a practical sense, IB just doesn't work with packet loss. You need
to have low error rate signalling or your network doesn't work/won't
perform.

> > The maximum packet size for an InfiniBand packet is about 4 KB (see also 
> > section 7.7.8 in the spec). This means that with an 8b/10b encoding the 
> > chance to lose a packet over a single link due to bit errors is about 
> > 4*10^-8. So the chance to lose a packet over a network consisting of n 
> > links with retry count r is about (n*4*10^-8)^r. With r=2 that results 
> > already in a really low value, even with multiple links. Since lowering 
> > the QP timeout might make congestion worse my preference is to lower the 
> > retry count.
> 
> You assume independent failures, which is suspect -- many times these
> are data-dependent, or so I tend to think. Jason, do you have any
> insight on this (overall) topic you could share?

All data transmitted on modern serial links is 'whitened'
somehow. This is does independently on a link-by-link basis either
with 8b/10b coding or with the 64b/66b scrambler. So the idea of a
high level 'magic packet' that causes data-dependent errors is not
statistically likely.

Errors in properly functioning modern serial links are pure-random,
from the perspective of SRP.

It is best to use all the information the SM provides when setting up
the path, however I don't think there is a best practice idea yet for
how to setup the retry count though..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2013-07-02 19:18 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-28 12:45 [PATCH v2 0/15] IB SRP initiator patches for kernel 3.11 Bart Van Assche
2013-06-28 12:53 ` [PATCH v2 08/15] scsi_transport_srp: Add transport layer error handling Bart Van Assche
2013-06-30 21:05   ` David Dillow
     [not found]     ` <1372626334.12468.34.camel-a7a0dvSY7KqLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2013-07-01  7:01       ` Bart Van Assche
     [not found]         ` <51D12941.3050105-HInyCGIudOg@public.gmane.org>
2013-07-01 11:19           ` David Dillow
2013-06-28 12:56 ` [PATCH v2 12/15] IB/srp: Fail SCSI commands silently Bart Van Assche
     [not found]   ` <51CD8812.20107-HInyCGIudOg@public.gmane.org>
2013-06-30 21:25     ` David Dillow
     [not found] ` <51CD856A.3010102-HInyCGIudOg@public.gmane.org>
2013-06-28 12:46   ` [PATCH v2 01/15] IB/srp: Fix remove_one crash due to resource exhaustion Bart Van Assche
2013-06-28 12:48   ` [PATCH v2 02/15] IB/srp: Fix race between srp_queuecommand() and srp_claim_req() Bart Van Assche
     [not found]     ` <51CD8604.5010801-HInyCGIudOg@public.gmane.org>
2013-06-28 14:42       ` Sebastian Riemer
     [not found]         ` <51CDA0CD.6060504-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
2013-06-28 14:51           ` Bart Van Assche
     [not found]             ` <51CDA2E5.2010704-HInyCGIudOg@public.gmane.org>
2013-06-28 15:08               ` Sebastian Riemer
2013-06-30 19:59       ` David Dillow
     [not found]         ` <1372622347.12468.9.camel-a7a0dvSY7KqLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2013-07-01  7:10           ` Bart Van Assche
2013-06-28 12:49   ` [PATCH v2 03/15] IB/srp: Avoid that srp_reset_host() is skipped after a TL error Bart Van Assche
     [not found]     ` <51CD8644.5080600-HInyCGIudOg@public.gmane.org>
2013-06-30 20:00       ` David Dillow
2013-06-28 12:49   ` [PATCH v2 04/15] IB/srp: Fail I/O fast if target offline Bart Van Assche
     [not found]     ` <51CD8676.6080205-HInyCGIudOg@public.gmane.org>
2013-06-30 20:02       ` David Dillow
2013-07-01  9:07       ` Sebastian Riemer
     [not found]         ` <51D146EE.6010209-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
2013-07-01 11:33           ` Bart Van Assche
     [not found]             ` <51D16918.60600-HInyCGIudOg@public.gmane.org>
2013-07-01 11:53               ` Sebastian Riemer
2013-07-01  9:25       ` Sebastian Riemer
     [not found]         ` <51D14AF1.4000803-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
2013-07-01 11:38           ` Bart Van Assche
     [not found]             ` <51D16A39.4050709-HInyCGIudOg@public.gmane.org>
2013-07-01 12:31               ` Sebastian Riemer
     [not found]                 ` <51D176B5.90609-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
2013-07-01 12:57                   ` Bart Van Assche
2013-07-02  8:30       ` Sebastian Riemer
2013-06-28 12:50   ` [PATCH v2 05/15] IB/srp: Skip host settle delay Bart Van Assche
2013-06-28 12:51   ` [PATCH v2 06/15] IB/srp: Maintain a single connection per I_T nexus Bart Van Assche
     [not found]     ` <51CD86CE.8080804-HInyCGIudOg@public.gmane.org>
2013-06-30 20:10       ` David Dillow
2013-06-28 12:52   ` [PATCH v2 07/15] IB/srp: Keep rport as long as the IB transport layer Bart Van Assche
2013-06-30 21:06     ` David Dillow
2013-06-28 12:54   ` [PATCH v2 09/15] IB/srp: Add srp_terminate_io() Bart Van Assche
     [not found]     ` <51CD877E.80606-HInyCGIudOg@public.gmane.org>
2013-06-30 21:10       ` David Dillow
2013-06-28 12:55   ` [PATCH v2 10/15] IB/srp: Use SRP transport layer error recovery Bart Van Assche
     [not found]     ` <51CD87A9.2090702-HInyCGIudOg@public.gmane.org>
2013-06-30 21:20       ` David Dillow
2013-06-28 12:55   ` [PATCH v2 11/15] IB/srp: Start timers if a transport layer error occurs Bart Van Assche
     [not found]     ` <51CD87D7.3050300-HInyCGIudOg@public.gmane.org>
2013-06-30 21:21       ` David Dillow
2013-06-28 12:57   ` [PATCH v2 13/15] IB/srp: Make HCA completion vector configurable Bart Van Assche
     [not found]     ` <51CD8846.4070400-HInyCGIudOg@public.gmane.org>
2013-06-30 21:26       ` David Dillow
2013-06-28 12:58   ` [PATCH v2 14/15] IB/srp: Make transport layer retry count configurable Bart Van Assche
     [not found]     ` <51CD8876.9020307-HInyCGIudOg@public.gmane.org>
2013-06-30 21:48       ` David Dillow
     [not found]         ` <1372628891.12468.52.camel-a7a0dvSY7KqLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2013-07-01  8:18           ` Bart Van Assche
     [not found]             ` <51D13B52.5060803-HInyCGIudOg@public.gmane.org>
2013-07-01 11:26               ` David Dillow
     [not found]                 ` <1372677965.12468.57.camel-a7a0dvSY7KqLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2013-07-01 11:44                   ` Bart Van Assche
2013-07-02 19:18                   ` Jason Gunthorpe [this message]
     [not found]                     ` <20130702191842.GD14625-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-07-03 14:26                       ` David Dillow
2013-06-28 12:59   ` [PATCH v2 15/15] IB/srp: Bump driver version and release date Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130702191842.GD14625@obsidianresearch.com \
    --to=jgunthorpe-epgobjl8dl3ta4ec/59zmfatqe2ktcn/@public.gmane.org \
    --cc=bvanassche-HInyCGIudOg@public.gmane.org \
    --cc=dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org \
    --cc=vuhuong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox