From mboxrd@z Thu Jan 1 00:00:00 1970 From: Boaz Harrosh Subject: Re: exofs/ore: allocation of _ore_get_io_state() Date: Thu, 24 May 2012 17:05:42 +0300 Message-ID: <4FBE4036.8020504@panasas.com> References: <4FBD4FFE.7020703@panasas.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: Linux FS Maling List To: Idan Kedar Return-path: Received: from natasha.panasas.com ([67.152.220.90]:42087 "EHLO natasha.panasas.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932304Ab2EXOFz (ORCPT ); Thu, 24 May 2012 10:05:55 -0400 In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 05/24/2012 02:23 PM, Idan Kedar wrote: > On Thu, May 24, 2012 at 12:00 AM, Boaz Harrosh wrote: >>> Is there any point to check if the memory is greater than 32MB? >>> >>> >> >> >> In theory it can allocate 32MB, in slab. I'm not sure about slob and slub. >> >> But in practice contiguous physical pages allocation tends to fail very >> fast on a system that was up a couple of hours. So we avoid it as plage. >> >> Past testing with tables bigger than PAGE_SIZE on the IO path gave >> catastrophic results. (Again once the system is up for a while and >> had a chance to fragment physical address space) > > What allocation sizes (of struct __alloc_all_io_state) are we talking > about? how many devices per I/O did you encounter? > Personally I had it with scsi-lib's sg_table bigger than PAGE_SIZE allocation. (Because of a bug) It is currently MAXed at PAGE_SIZE. Other people reported same failures and great performance degradation when allocating BIOs and BIO_VECs larger then PAGE_SIZE. It's simply the old and known page-fragmentation problem. It's why virtual memory was invented in the first place. kmalloc is not a virtual allocator. >> >> The all Kernel point of the use of sg-lists is so not to allocate >> contiguous physical pages and to not have to use virtual-memory. >> >> This is done all over the Kernel. MAX_BIO_SIZE max-sg-table ... > > Why not use virtual memory? Is this limitation imposed by the OSD > initiator or by some other layer in the OSD stack? > Welcome to Linux Kernel 101. vmalloc is ten fold slower than kmalloc. And in principal the same will happen, multiple discrete pages will be allocated, and collected together but now you will need to set up a TLB entries, and make sure they are mapped in when needed. (Every interrupt every context switch) This single fact of "Linux Kernel code does not use VM" is a 10 fold speed gain over Windows Kernel, measured. >> >> (BTW I saw this mail by chance. If you direct it to me I see it >> for sure) >> >> Cheers >> Boaz > Boaz