Re: exofs/ore: allocation of _ore_get_io_state()

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Boaz Harrosh <bharrosh@panasas.com>
To: Idan Kedar <idank@tonian.com>
Cc: Linux FS Maling List <linux-fsdevel@vger.kernel.org>
Subject: Re: exofs/ore: allocation of _ore_get_io_state()
Date: Thu, 24 May 2012 17:05:42 +0300	[thread overview]
Message-ID: <4FBE4036.8020504@panasas.com> (raw)
In-Reply-To: <CABpMAyLNA6P9c_O6CRAFp+TtwSvT0USCDtHU8Ev9uda1dC0frA@mail.gmail.com>

On 05/24/2012 02:23 PM, Idan Kedar wrote:

> On Thu, May 24, 2012 at 12:00 AM, Boaz Harrosh <bharrosh@panasas.com> wrote:
>>> Is there any point to check if the memory is greater than 32MB?
>>>
>>>
>>
>>
>> In theory it can allocate 32MB, in slab. I'm not sure about slob and slub.
>>
>> But in practice contiguous physical pages allocation tends to fail very
>> fast on a system that was up a couple of hours. So we avoid it as plage.
>>
>> Past testing with tables bigger than PAGE_SIZE on the IO path gave
>> catastrophic results. (Again once the system is up for a while and
>> had a chance to fragment physical address space)
> 
> What allocation sizes (of struct __alloc_all_io_state) are we talking
> about? how many devices per I/O did you encounter?
> 

Personally I had it with scsi-lib's sg_table bigger than PAGE_SIZE
allocation. (Because of a bug) It is currently MAXed at PAGE_SIZE.
Other people reported same failures and great performance degradation
when allocating BIOs and BIO_VECs larger then PAGE_SIZE. 

It's simply the old and known page-fragmentation problem. It's
why virtual memory was invented in the first place.
kmalloc is not a virtual allocator.

>>
>> The all Kernel point of the use of sg-lists is so not to allocate
>> contiguous physical pages and to not have to use virtual-memory.
>>
>> This is done all over the Kernel. MAX_BIO_SIZE max-sg-table ...
> 
> Why not use virtual memory? Is this limitation imposed by the OSD
> initiator or by some other layer in the OSD stack?
> 

Welcome to Linux Kernel 101. vmalloc is ten fold slower than
kmalloc. And in principal the same will happen, multiple discrete
pages will be allocated, and collected together but now you will need
to set up a TLB entries, and make sure they are mapped in when needed.
(Every interrupt every context switch)

This single fact of "Linux Kernel code does not use VM" is a
10 fold speed gain over Windows Kernel, measured.

>>
>> (BTW I saw this mail by chance. If you direct it to me I see it
>>  for sure)
>>
>> Cheers
>> Boaz
> 

Boaz

     prev parent reply	other threads:[~2012-05-24 14:05 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-16 14:53 exofs/ore: allocation of _ore_get_io_state() Idan Kedar
2012-05-23 21:00 ` Boaz Harrosh
2012-05-24 11:23   ` Idan Kedar
2012-05-24 14:05     ` Boaz Harrosh [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FBE4036.8020504@panasas.com \
    --to=bharrosh@panasas.com \
    --cc=idank@tonian.com \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).