linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alexey Korolev <akorolex@gmail.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: Alexey Korolev <akorolev@infradead.org>, linux-mm@kvack.org
Subject: Re: [RFC][PATCH 1/2] HugeTLB mapping for drivers (Alloc/free for drivers, hstate_nores)
Date: Tue, 14 Jul 2009 23:51:59 +1200	[thread overview]
Message-ID: <202cde0e0907140451h14ecb494xe7a8e7d9c235d538@mail.gmail.com> (raw)
In-Reply-To: <20090714093718.GB28569@csn.ul.ie>

Hi,

> Ok, this makes me raise an eyebrow immediately. Things to watch out for
> are
>
> o allocations made by the driver that are not faulted immediately can
>  potentially fail at fault time if reservations are not made
> o allocations that ignore the userspace reservations and allocate huge
>  pages from the pool potentially cause application failure later

Indeed.
>
> You deal with the latter but the former will depend on how the driver is
> implemented.
>
>> This is different to prototype. Why it is implemented? HugetlbFs and
>> drivers reservations has completely different sources of reservations.
>> In hugetlbfs case it is dictated by users. So it is necessary to bother
>> about restrictions/ quotas etc.
>
> The reservations in the hugetlbfs case are about stability. If hugepages
> were not reserved or allocated at mmap() time, a failure to allocate a
> page at fault time will force-kill the application. Be careful that
> drivers really are immune from the same problem.
>
Dirvers must care about cases when there is no memory. For example failure
to allocate DMA buffer in device driver usually means inaccessible
device. It is
normal to expect module insert failure (or failure of all further
requests) in this case.
Usually applications have information about size of DMA buffer.
If an application requested memory which is out of range it will be
fine to terminate the application, but kernel mustn't fall to panic.
(Oneof test case - will check around it)

>> In driver case it is dictated by HW. In thius case it is necessary involve user
>> in tuning process as less as possible.
>> If we would use HugeTlbFs reservations - we would need to force user to
>> supply how much huge pages needs to be reserved for drivers.
>> To protect drivers to interract with htlbfs reservations the state hstate_nores was
>> introduced.
>
> What does nores mean?
>
nores means no reservation. I.e. if we are in this stage it tells
hugetlb.c functions to
behave as VM_NORESERVE flag is specified.
Initially name was hstate_drv. (but it sounds not so good as well )

>> Reservations with a state hstate_nores should not touch htlb
>> pools.
>>
>
> Ok, that's good, but you still need to be careful in the event you setup
> a mapping that doesn't have associated hugepages allocated.
>
Who setup mapping driver or application?
If driver - it will be rather hard to said what we should do in this
case. I would
prefer to see panic because it means driver did something nasty.

>> +void hugetlb_free_pages(struct page *page)
>> +{
>
> This name is too general. There is nothing to indicate that it is only
> used by drivers.

Acked.
Need to think more about good naming. I will discuss it with
colleagues. It is hard to
choose something good now.

>
>> +     int i;
>> +     struct hstate *h = &hstate_nores;
>> +
>> +     VM_BUG_ON(h->order >= MAX_ORDER);
>> +
>
> This is a perfectly possible condition for you unfortunately in the current
> initialisation of hstate_nores. Nothing stops the default hugepage size being
> set to 1G or 16G on machines that wanted that pagesize used for shared memory
> segments. On such configurations, you should either be failing the allocation
> or having hstate_nores use a smaller hugepage size.
>
Ahhhh! I did not expect this. So it is necessary to be very accurate
when choosing parameters
of hstate_nores. Thanks a lot for pointing to this.

>> +     for (i = 0; i < pages_per_huge_page(h); i++) {
>> +             page[i].flags &= ~(1 << PG_locked | 1 << PG_error |
>> +                     1 << PG_referenced | 1 << PG_dirty | 1 << PG_active |
>> +                     1 << PG_reserved | 1 << PG_private | 1 << PG_writeback);
>> +     }
>> +     set_compound_page_dtor(page, NULL);
>> +     set_page_refcounted(page);
>> +     arch_release_hugepage(page);
>> +     __free_pages(page, huge_page_order(h));
>> +}
>> +EXPORT_SYMBOL(hugetlb_free_pages);
>
> You need to reuse update_and_free_page() somehow here by splitting the
> accounting portion from the page free portion. I know this is a
> prototype but at least comment that it's copied from
> update_and_free_page() for anyone else looking to review this that is
> not familiar with hugetlbfs.
>
Acked. Thanks. Will be updated.
>> +
>>  /* Put bootmem huge pages into the standard lists after mem_map is up */
>>  static void __init gather_bootmem_prealloc(void)
>>  {
>> @@ -1078,7 +1123,13 @@ static void __init hugetlb_init_hstates(
>>               if (h->order < MAX_ORDER)
>>                       hugetlb_hstate_alloc_pages(h);
>>       }
>> +     /* Special hstate for use of drivers, allocations are not
>> +      * tracked by hugetlbfs */
>
> The term "tracked" doesn't really say anything. How about something
> like;
>
> /*
>  * hstate_nores is used by drivers. Allocations are immediate,
>  * there is no hugepage pool and there are no reservations made
>  */
>
Immediate sounds better. Seems naming needs to be tuned more. I'll ask
colleagues
to help here.

Thanks,
Alexey

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

      reply	other threads:[~2009-07-14 11:21 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-14  1:47 [RFC][PATCH 1/2] HugeTLB mapping for drivers (Alloc/free for drivers, hstate_nores) Alexey Korolev
2009-07-14  9:37 ` Mel Gorman
2009-07-14 11:51   ` Alexey Korolev [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202cde0e0907140451h14ecb494xe7a8e7d9c235d538@mail.gmail.com \
    --to=akorolex@gmail.com \
    --cc=akorolev@infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).