netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Alex Aizman" <itn780@yahoo.com>
To: <open-iscsi@googlegroups.com>
Cc: <mpm@selenic.com>, <andrea@suse.de>, <michaelc@cs.wisc.edu>,
	<James.Bottomley@HansenPartnership.com>, <netdev@oss.sgi.com>,
	"'David S. Miller'" <davem@davemloft.net>,
	<ksummit-2005-discuss@thunk.org>
Subject: RE: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics
Date: Sun, 27 Mar 2005 13:14:42 -0800	[thread overview]
Message-ID: <200503272115.j2RLF5d0007298@oss.sgi.com> (raw)
In-Reply-To: <20050326224621.61f6d917.davem@davemloft.net>

David S. Miller writes:
> 
> On Sat, 26 Mar 2005 22:33:01 -0800
> Dmitry Yusupov <dmitry_yus@yahoo.com> wrote:
> 
> > i.e. TCP stack should call NIC driver's callback after all SKB data 
> > been successfully copied to the user space. At that point 
> NIC driver 
> > will safely replenish HW ring. This way we could avoid most 
> of memory 
> > allocations on receive.
> 
> How does this solve your problem?  This is just simple SKB 
> recycling, and it's a pretty old idea.
> 
> TCP packets can be held on receive for arbitrary amounts of time.
> 
> This is especially true if data is received out of order or 
> when packets are dropped.  We can't even wake up the user 
> until the holes in the sequence space are filled.
> 
> Even if data is received properly and in order, there are no 
> hard guarentees about when the user will get back onto the 
> CPU to get the data copied to it.
> 
> During these gaps in time, you will need to keep your HW 
> receive ring populated with packets.


Here's the way I see it. 

1) There are iSCSI connections that should be "protected", resources-wise.
Examples: remote swap device, bank accounts database on RAID accessed via
iSCSI, etc.

2) There are two ways to protect the "protected" connections. One "Big
Brother" like way is a centralized Resource Manager that performs a fully
deterministic resource accounting throughout the system, all the way from
NIC descriptors and on-chip memory up to iSCSI buffers for Data-Out headers.


3) The 2nd way is *awareness* of the "protected" connections propagated
throughout the system, along with incremental implementation of more
sophisticated recovery schemes.

4) The Resource Manager could be used in the following way. At session open
time iSCSI control plane calculates iSCSI and TCP resources that should be
available at all times. The calculation is done based on: the number of SCSI
commands to be processed in parallel (the 'can_queue'), the maximum size of
the SCSI payload in the SG, the negotiated maximum number of outstanding
R2Ts, sizes of Immediate and FirstBurst data. 

5) If Resource manager says there is not enough resources, iSCSI fails
session open. This is better than to get in trouble well into runtime.

6) For example: to transmit 'can_queue' commands, iSCSI needs N skbufs.
Let's say, all N commands transmitted in a burst, and just one of these N
gets ack-ed by the Target (via StatSN). In the fully deterministic system
this does not necessarily mean that the scsi-ml can now send one command -
because the full condition involves also recycling of skbuf(s) used for
transmitting this one completed command. And although it is hard to imagine
that the command gets fully done by the remote target without Tx buffers
getting recycled, the theoretical chance exists (e.g., the NIC is slow or
the driver has a bad Tx recycling implementation), and the fully
deterministic scheme should take it into account.

7) Therefore, prior to calling scsi_done() iSCSI asks Resource Manager
whether all the TCP etc. resources used for this command are already
recycled. If not, the scsi_done() gets postponed. In addition, iSCSI
"complains" to Resource Manager that it enters slow path because of this,
which could prompt the latter to take an action. (End of the example).

8) If we agree to declare some connections "resource-proteced", it would
immediately mean that there are possibly other connections that are not
(resource-protected). Which in turn gives the Resource Manager a flexibility
to OOM-kill those unprotected connections and cannibalize the corresponding
resources for the protected ones.

9) Without some awareness of the resource-protected connections, and without
some kind of resource counting at runtime (let it be partial and incomplete
for starters) - the only remaining way for customers that require HA (High
Availability) is to over-engineer: use 64GB RAM, TBs of disk space, etc.
Which is probably not the end of the world as long as the prices go down..

Alex

  parent reply	other threads:[~2005-03-27 21:14 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <4241D106.8050302@cs.wisc.edu>
     [not found] ` <20050324101622S.fujita.tomonori@lab.ntt.co.jp>
     [not found]   ` <1111628393.1548.307.camel@beastie>
     [not found]     ` <20050324113312W.fujita.tomonori@lab.ntt.co.jp>
     [not found]       ` <1111633846.1548.318.camel@beastie>
     [not found]         ` <20050324215922.GT14202@opteron.random>
     [not found]           ` <424346FE.20704@cs.wisc.edu>
     [not found]             ` <20050324233921.GZ14202@opteron.random>
     [not found]               ` <20050325034341.GV32638@waste.org>
     [not found]                 ` <20050327035149.GD4053@g5.random>
2005-03-27  5:48                   ` [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics Matt Mackall
2005-03-27  6:04                     ` Andrea Arcangeli
2005-03-27  6:38                       ` Matt Mackall
2005-03-27 14:50                         ` Andrea Arcangeli
2005-03-27  6:33                     ` Dmitry Yusupov
2005-03-27  6:46                       ` David S. Miller
2005-03-27  7:05                         ` Dmitry Yusupov
2005-03-27  7:57                           ` David S. Miller
2005-03-27  8:18                             ` Dmitry Yusupov
2005-03-27 18:26                               ` Mike Christie
2005-03-27 18:31                                 ` David S. Miller
2005-03-27 19:58                                   ` Matt Mackall
2005-03-27 21:49                                   ` Dmitry Yusupov
2005-03-27 18:47                                 ` Dmitry Yusupov
2005-03-27 21:14                         ` Alex Aizman [this message]
     [not found]                         ` <20050327211506.85EDA16022F6@mx1.suse.de>
2005-03-28  0:15                           ` Andrea Arcangeli
2005-03-28  3:54                         ` Rik van Riel
2005-03-28  4:34                           ` David S. Miller
2005-03-28  4:50                             ` Rik van Riel
2005-03-28  6:58                           ` Alex Aizman
2005-03-28 16:12                           ` Andi Kleen
2005-03-28 16:22                             ` Andrea Arcangeli
2005-03-28 16:24                             ` Rik van Riel
2005-03-29 15:11                               ` Andi Kleen
2005-03-29 15:29                                 ` Rik van Riel
2005-03-29 17:03                                 ` Matt Mackall
2005-03-28 16:28                             ` James Bottomley
2005-03-29 15:20                               ` Andi Kleen
2005-03-29 15:56                                 ` James Bottomley
2005-03-29 17:19                                 ` Dmitry Yusupov
2005-03-29 21:08                                   ` jamal
2005-03-29 22:00                                     ` Rik van Riel
2005-03-29 22:17                                       ` Matt Mackall
2005-03-29 23:30                                         ` jamal
2005-03-29 23:00                                       ` jamal
2005-03-29 23:25                                         ` Matt Mackall
2005-03-30  0:30                                           ` H. Peter Anvin
2005-03-30 15:24                                         ` Andi Kleen
2005-03-29 22:03                                     ` Rick Jones
2005-03-29 23:13                                       ` jamal
2005-03-30  2:28                                         ` Alex Aizman
     [not found]                                         ` <E1DGSwp-0004ZE-00@thunker.thunk.org>
2005-03-30 17:16                                           ` Grant Grundler
2005-03-30 18:46                                         ` Dmitry Yusupov
2005-03-30 15:22                                     ` Andi Kleen
2005-03-30 15:33                                       ` Andrea Arcangeli
2005-03-30 15:38                                         ` Rik van Riel
2005-03-30 15:39                                         ` Andi Kleen
2005-03-30 15:44                                           ` Andrea Arcangeli
2005-03-30 15:50                                             ` Rik van Riel
2005-03-30 16:04                                               ` James Bottomley
2005-03-30 17:48                                                 ` H. Peter Anvin
2005-03-30 16:02                                             ` Andi Kleen
2005-03-30 16:15                                               ` Andrea Arcangeli
2005-03-30 16:55                                                 ` jamal
2005-03-30 18:42                                                   ` Rik van Riel
2005-03-30 19:28                                                 ` Alex Aizman
2005-03-31 11:41                                                 ` Andi Kleen
2005-03-31 12:12                                                   ` Rik van Riel
2005-03-31 18:59                                                     ` Andi Kleen
2005-03-31 19:04                                                       ` Rik van Riel
2005-03-31 15:35                                                   ` Grant Grundler
2005-03-31 19:15                                                   ` Alex Aizman
2005-03-31 19:34                                                   ` Andi Kleen
2005-03-31 19:39                                                     ` Rik van Riel
2005-03-31 11:45                                                 ` Andi Kleen
2005-03-31 11:50                                                 ` Andi Kleen
2005-03-31 17:09                                                   ` Andrea Arcangeli
2005-03-31 22:05                                                     ` Dmitry Yusupov
2005-03-30 17:24                                       ` Matt Mackall
2005-03-30 17:39                                         ` Dmitry Yusupov
2005-03-30 20:10                                           ` Mike Christie
2005-03-30 17:07                                     ` Grant Grundler
2005-03-30  5:12                                   ` H. Peter Anvin
2005-03-28 16:37                             ` Dmitry Yusupov
2005-03-28 19:45                         ` Roland Dreier
2005-03-28 20:32                           ` Topic: Remote DMA network technologies Gerrit Huizenga
2005-03-28 20:36                             ` Roland Dreier
     [not found]                           ` <1112042936.5088.22.camel@beastie>
2005-03-28 22:32                             ` [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics Benjamin LaHaise
2005-03-29  3:19                               ` Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) Roland Dreier
2005-03-30 16:00                                 ` Benjamin LaHaise
2005-03-31  1:08                                   ` Linux support for RDMA H. Peter Anvin
2005-04-02 18:08                               ` [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics Dmitry Yusupov
2005-04-02 19:13                                 ` Ming Zhang
2005-04-04  6:31                                 ` Grant Grundler
2005-04-04 18:57                                 ` Rick Jones
2005-03-29  3:14                             ` Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) Roland Dreier
     [not found] <42472259.2866086e.3169.318fSMTPIN_ADDED@mx.googlegroups.com>
2005-03-27 21:18 ` [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics Alex Aizman
2005-03-27 21:53 Alex Aizman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200503272115.j2RLF5d0007298@oss.sgi.com \
    --to=itn780@yahoo.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=andrea@suse.de \
    --cc=davem@davemloft.net \
    --cc=ksummit-2005-discuss@thunk.org \
    --cc=michaelc@cs.wisc.edu \
    --cc=mpm@selenic.com \
    --cc=netdev@oss.sgi.com \
    --cc=open-iscsi@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).