All of lore.kernel.org
 help / color / mirror / Atom feed
* Recommendations on System Tuning for JFFS2/MTD 128 KiB Block Page Allocation Failures
@ 2011-03-16 17:28 Grant Erickson
  2011-04-01 15:26 ` [RFC] Removing FIXMEs in mtdchar.c: mtd_{read,write} (was Re: Recommendations on System Tuning for JFFS2/MTD 128 KiB Block Page Allocation Failures) Grant Erickson
  0 siblings, 1 reply; 2+ messages in thread
From: Grant Erickson @ 2011-03-16 17:28 UTC (permalink / raw)
  To: linux-mtd

I've an OMAP3 ARM-based embedded system with 256 MiB of NAND flash and 64
MiB of RAM on Linux 2.6.32 in which both sys_mount (via mount) and sys_read
(via fw_setenv) occasionally fail with "page allocation failure. order:5,
mode:0xd0".

In the analysis I've done so far, sys_mount funnels down to
jffs2_scan_medium which eventually calls kmalloc with a size of 128 KiB (to
cover a single NAND erase block) and flag GFP_KERNEL:

    sw/tps/linux/linux/fs/jffs2/scan.c:
    ...
    120         /* Respect kmalloc limitations */
    121         if (buf_size > 128*1024)
    122             buf_size = 128*1024;
    123 
    124         D1(printk(KERN_DEBUG "Allocating readbuf of %d bytes\n",
buf_si
   124 ze));
    125         flashbuf = kmalloc(buf_size, GFP_KERNEL);
    126         if (!flashbuf)
    127             return -ENOMEM;
    128     }
    ...

The sys_read case winds down to mtd_read which eventually calls kmalloc with
a size of 128 KiB (to cover a single NAND erase blcok) and flag GFP_KERNEL:

    sw/tps/linux/linux/drivers/mtd/mtdchar:
    ...
    161     if (count > MAX_KMALLOC_SIZE)
    162         kbuf=kmalloc(MAX_KMALLOC_SIZE, GFP_KERNEL);
    163     else
    164         kbuf=kmalloc(count, GFP_KERNEL);
    ...

Both of these kmallocs ultimate funnel down to __alloc_pages_nodemask in
linux/mm/page_alloc.c and falling down to the very bottom of that routine,
we find that we eventually fall through to the bottom of
__alloc_pages_slowpath at the 'nopage' label because, ostensibly, no free
pages could be found on the free page list. The memory information dump
seems to bear this out with '0' 128 KiB page blocks/slabs available:

    Mem-info:
    Normal per-cpu:
    CPU    0: hi:   18, btch:   3 usd:   0
    active_anon:160 inactive_anon:610 isolated_anon:0
     active_file:7364 inactive_file:3946 isolated_file:0
     unevictable:0 dirty:0 writeback:0 unstable:0
     free:468 slab_reclaimable:257 slab_unreclaimable:1146
     mapped:1611 shmem:6 pagetables:69 bounce:0
    Normal free:1872kB min:1016kB low:1268kB high:1524kB active_anon:640kB
inactive_anon:2440kB active_file:29456kB inactive_file:15784kB
unevictable:0kB
isolated(anon):0kB     isolated(file):0kB present:65024kB mlocked:0kB
dirty:0kB
writeback:0kB mapped:6444kB shmem:24kB slab_reclaimable:1028kB
slab_unreclaimable:4584kB kernel_stack:368kB         pagetables:276kB
unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable?
no
    lowmem_reserve[]: 0 0
    Normal: 58*4kB 15*8kB 25*16kB 21*32kB 7*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 1872kB
    11316 total pagecache pages
    0 pages in swap cache
    Swap cache stats: add 0, delete 0, find 0/0
    Free swap  = 0kB
    Total swap = 0kB
    16384 pages of RAM
    562 free pages
    1915 reserved pages
    1403 slab pages
    4075 pages shared
    0 pages swap cached

Ostensibly this occurs because of memory fragmentation where any of the
lower order blocks are are available must be non-contiguous.

As an experiment, I call:

    sync
    sysctl -w vm.drop_caches=3

and free memory changes accordingly:

    System free memory is currently 10,004 KiB.
    System free memory is now 39,428 KiB.

as reported by /proc/meminfo, before running:

    fw_setenv foo bar

and still see the page allocation failure.

The system is currently configured with the SLAB allocator. Has anyone found
better fragmentation and low-memory performance with the default SLUB or
embedded SLOB allocators? How about tweaking:

    vm.min_free_kbytes
    vm.vfs_cache_pressure

Anyone met with success there?

Best,

Grant Erickson

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [RFC] Removing FIXMEs in mtdchar.c: mtd_{read,write} (was Re: Recommendations on System Tuning for JFFS2/MTD 128 KiB Block Page Allocation Failures)
  2011-03-16 17:28 Recommendations on System Tuning for JFFS2/MTD 128 KiB Block Page Allocation Failures Grant Erickson
@ 2011-04-01 15:26 ` Grant Erickson
  0 siblings, 0 replies; 2+ messages in thread
From: Grant Erickson @ 2011-04-01 15:26 UTC (permalink / raw)
  To: linux-mtd; +Cc: David Woodhouse, Artem Bityutskiy

On 3/16/11 10:28 AM, Grant Erickson wrote:
> I've an OMAP3 ARM-based embedded system with 256 MiB of NAND flash and 64 MiB
> of RAM on Linux 2.6.32 in which both sys_mount (via mount) and sys_read (via
> fw_setenv) occasionally fail with "page allocation failure. order:5,
> mode:0xd0".
> 
> In the analysis I've done so far, sys_mount funnels down to jffs2_scan_medium
> which eventually calls kmalloc with a size of 128 KiB and flag GFP_KERNEL:
> 
>     sw/tps/linux/linux/fs/jffs2/scan.c:
>     ...
>     120         /* Respect kmalloc limitations */
>     121         if (buf_size > 128*1024)
>     122             buf_size = 128*1024;
>     123 
>     124         D1(printk(KERN_DEBUG "Allocating readbuf of %d bytes\n",
> buf_size));
>     125         flashbuf = kmalloc(buf_size, GFP_KERNEL);
>     126         if (!flashbuf)
>     127             return -ENOMEM;
>     128     }
>     ...
> 
> The sys_read case winds down to mtd_read which eventually calls kmalloc with a
> size of 128 KiB (to cover a single NAND erase blcok) and flag GFP_KERNEL:
> 
>     sw/tps/linux/linux/drivers/mtd/mtdchar:
>     ...
>     161     if (count > MAX_KMALLOC_SIZE)
>     162         kbuf=kmalloc(MAX_KMALLOC_SIZE, GFP_KERNEL);
>     163     else
>     164         kbuf=kmalloc(count, GFP_KERNEL);
>     ...
> 
> Ostensibly this occurs because of memory fragmentation where any of the lower
> order blocks are are available must be non-contiguous.
> 
> ...
>
> The system is currently configured with the SLAB allocator. Has anyone found
> better fragmentation and low-memory performance with the default SLUB or
> embedded SLOB allocators? How about tweaking:
> 
>     vm.min_free_kbytes
> 
> Anyone met with success there?

For anyone following this thread, FWIW, I was able to reduce statistically
but not eliminate the likelihood of this issue occurring with set in
sysctl.conf:

    vm.min_free_kybtes = 2048

However, this problem is all about fragmentation, so a solution such as this
will never really guarantee that this problem goes away entirely.

In the meantime, I've been contemplating a more permanent solution in
mtdchar.c:mtd_{read,write} and, before pressing ahead with either, wanted to
get some feedback on which might have upstream integration support.

1) Simpler but Use More Memory

This approach keeps the code, more or less, as is; however, rather than
failing outright, it continues to attempt to allocate smaller and smaller
blocks (dividing by two each time) until either it succeeds or hits the
minimum (either count or PAGE_SIZE).

2) More Complex but Use Less Memory

This approach seems to be what was intimated in the "FIXME" comments from
2005 found in this code and maps in and pins user pages for the read or
write request using get_user_pages. Where possible, adjacent pages are
grouped together into iovec extents and then mtd_{read,write} are called in
a loop for each iovec covering that extent of mapped pages in a manner
similar to {read,write}v having been called from user space.

Comments welcomed.

Regards,

Grant

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-04-01 15:26 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-16 17:28 Recommendations on System Tuning for JFFS2/MTD 128 KiB Block Page Allocation Failures Grant Erickson
2011-04-01 15:26 ` [RFC] Removing FIXMEs in mtdchar.c: mtd_{read,write} (was Re: Recommendations on System Tuning for JFFS2/MTD 128 KiB Block Page Allocation Failures) Grant Erickson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.