From mboxrd@z Thu Jan 1 00:00:00 1970 From: James Bottomley Subject: Re: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics Date: Tue, 29 Mar 2005 09:56:48 -0600 Message-ID: <1112111808.5510.16.camel@mulgrave> References: <424346FE.20704@cs.wisc.edu> <20050324233921.GZ14202@opteron.random> <20050325034341.GV32638@waste.org> <20050327035149.GD4053@g5.random> <20050327054831.GA15453@waste.org> <1111905181.4753.15.camel@mylaptop> <20050326224621.61f6d917.davem@davemloft.net> <1112027284.5531.27.camel@mulgrave> <20050329152008.GD63268@muc.de> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Cc: Rik van Riel , Dmitry Yusupov , mpm@selenic.com, andrea@suse.de, michaelc@cs.wisc.edu, open-iscsi@googlegroups.com, ksummit-2005-discuss@thunk.org, netdev@oss.sgi.com Return-path: To: Andi Kleen In-Reply-To: <20050329152008.GD63268@muc.de> Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com List-Id: netdev.vger.kernel.org On Tue, 2005-03-29 at 17:20 +0200, Andi Kleen wrote: > > Actually, not in 2.6 ... we had the same issue in SCSI using mempools > > for sglist allocation. All of the mempool alocation paths now take gfp_ > > flags, so you can specify GFP_ATOMIC for interrupt context. > > Just does not work when you are actually short of memory. > > Just think a second on how a mempool works: In the extreme > case when it cannot allocate system memory anymore it has > to wait for someone else to free a memory block into the mempool, > then pass it on to the next allocator etc. Basically > it is a direct bypass pipeline for memory to pass memory > directly from one high priority user to another. This only > works with sleeping. Otherwise you could not handle an arbitary > number of users with a single mempool. > > So to get a reliable mempool you have to sleep on allocation. But that's not what we use them for. You are confusing reliability with forward progress. In SCSI we use GFP_ATOMIC mempools in order to make forward progress. All the paths are coded to expect a failure (in which case we requeue). For forward progress what we need is the knowledge that there are n resources out there dedicated to us. When they return they get reallocated straight to us and we can restart the queue processing (there's actually a SCSI trigger that does this). For receive mempools, the situation is much the same; if you have n reserved buffers, then you have to drop the n+1 th packet. However, the resources will free up and go back to your mempool, and eventually you accept the packet on retransmit. The killer scenario (and why we require a mempool) is that someone else gets the memory before you but then becomes blocked on another allocation, so now you have no more allocations to allow forward progress. James > > The object isn't to make the queues *reliable* it's to ensure the system > > can make forward progress. So all we're trying to ensure is that the > > sockets used to service storage have some probability of being able to > > send and receive packets during low memory. > > For that it is enough to make the sender reliable. Retransmit > takes care of the rest. No ... we cannot get down to the situation where GFP_ATOMIC always fails. Now we have no receive capacity at all and the system deadlocks. > > In your scenario, if we're out of memory and the system needs several > > ACK's to the swap device for pages to be released to the system, I don't > > see how we make forward progress since without a reserved resource to > > allocate from how does the ack make it up the stack to the storage > > driver layer? > > Typically because the RX ring of the driver has some packets left. > > Also since TCP is very persistent and there is some memory > activity left you will have at least occasionally a time slot > where a GFP_ATOMIC allocation can succeed. That's what I think a mempool is required to guarantee. Without it, there are scenarios where GFP_ATOMIC always fails. James