* pre6 VM issues
@ 2001-10-09 12:44 Marcelo Tosatti
2001-10-09 12:48 ` Marcelo Tosatti
` (3 more replies)
0 siblings, 4 replies; 20+ messages in thread
From: Marcelo Tosatti @ 2001-10-09 12:44 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Andrea Arcangeli, lkml
Hi,
I've been testing pre6 (actually its pre5 a patch which Linus sent me
named "prewith 16GB of RAM (thanks to OSDLabs for that), and I've found
out some problems. First of all, we need to throttle normal allocators
more often and/or update the low memory limits for normal allocators to a
saner value. I already said I think allowing everybody to eat up to
"freepages.min" is too low for a default.
I've got atomic memory failures with _22GB_ of swap free (32GB total):
eth0: can't fill rx buffer (force 0)!
Another issue is the damn fork() special case. Its failing in practice:
bash: fork: Cannot allocate memory
Also with _LOTS_ of swap free. (gigs of them)
Linus, we can introduce a "__GFP_FAIL" flag to be used by _everyone_ which
wants to do higher order allocations as an optimization (eg allocate big
scatter-gather tables or whatever). Or do you prefer to make the fork()
allocation a separate case ?
I'll take a closer look at the code now and make the throttling/limits to
what I think is saner for a default.
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: pre6 VM issues 2001-10-09 12:44 pre6 VM issues Marcelo Tosatti @ 2001-10-09 12:48 ` Marcelo Tosatti 2001-10-09 14:17 ` BALBIR SINGH ` (2 subsequent siblings) 3 siblings, 0 replies; 20+ messages in thread From: Marcelo Tosatti @ 2001-10-09 12:48 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrea Arcangeli, lkml On Tue, 9 Oct 2001, Marcelo Tosatti wrote: > > Hi, > > I've been testing pre6 (actually its pre5 a patch which Linus sent me > named "prewith 16GB of RAM (thanks to OSDLabs for that), and I've found I haven't woke up correctly yet, I guess. I mean its pre6 with a patch named "p5p6" which Linus sent me. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: pre6 VM issues 2001-10-09 12:44 pre6 VM issues Marcelo Tosatti 2001-10-09 12:48 ` Marcelo Tosatti @ 2001-10-09 14:17 ` BALBIR SINGH 2001-10-09 13:01 ` Marcelo Tosatti 2001-10-09 14:31 ` Andrea Arcangeli 2001-10-09 14:50 ` Andrea Arcangeli 3 siblings, 1 reply; 20+ messages in thread From: BALBIR SINGH @ 2001-10-09 14:17 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Linus Torvalds, Andrea Arcangeli, lkml [-- Attachment #1: Type: text/plain, Size: 2076 bytes --] Most of the traditional unices maintained a pool for each subsystem (this is really useful when u have the memory to spare), so not matter what they use memory only from their pool (and if needed peek outside), but nobody else used the memory from the pool. I have seen cases where, I have run out of physical memory on my system, so I try to log in using the serial console, but since the serial driver does get_free_page (this most likely fails) and the driver complains back. So, I had suggested a while back that important subsystems should maintain their own pool (it will take a new thread to discuss the right size of each pool). Why can't Linux follow the same approach? especially on systems with a lot of memory. Balbir Marcelo Tosatti wrote: >Hi, > >I've been testing pre6 (actually its pre5 a patch which Linus sent me >named "prewith 16GB of RAM (thanks to OSDLabs for that), and I've found >out some problems. First of all, we need to throttle normal allocators >more often and/or update the low memory limits for normal allocators to a >saner value. I already said I think allowing everybody to eat up to >"freepages.min" is too low for a default. > >I've got atomic memory failures with _22GB_ of swap free (32GB total): > > eth0: can't fill rx buffer (force 0)! > >Another issue is the damn fork() special case. Its failing in practice: > >bash: fork: Cannot allocate memory > >Also with _LOTS_ of swap free. (gigs of them) > >Linus, we can introduce a "__GFP_FAIL" flag to be used by _everyone_ which >wants to do higher order allocations as an optimization (eg allocate big >scatter-gather tables or whatever). Or do you prefer to make the fork() >allocation a separate case ? > >I'll take a closer look at the code now and make the throttling/limits to >what I think is saner for a default. > > >- >To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html >Please read the FAQ at http://www.tux.org/lkml/ > [-- Attachment #2: Wipro_Disclaimer.txt --] [-- Type: text/plain, Size: 853 bytes --] ---------------------------------------------------------------------------------------------------------------------- Information transmitted by this E-MAIL is proprietary to Wipro and/or its Customers and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwarded to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly prohibited. In such cases, please notify us immediately at mailto:mailadmin@wipro.com and delete this mail from your records. ---------------------------------------------------------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: pre6 VM issues 2001-10-09 14:17 ` BALBIR SINGH @ 2001-10-09 13:01 ` Marcelo Tosatti 2001-10-09 14:37 ` BALBIR SINGH 0 siblings, 1 reply; 20+ messages in thread From: Marcelo Tosatti @ 2001-10-09 13:01 UTC (permalink / raw) To: BALBIR SINGH; +Cc: Linus Torvalds, Andrea Arcangeli, lkml On Tue, 9 Oct 2001, BALBIR SINGH wrote: > Most of the traditional unices maintained a pool for each subsystem > (this is really useful when u have the memory to spare), so not matter > what they use memory only from their pool (and if needed peek outside), > but nobody else used the memory from the pool. > > I have seen cases where, I have run out of physical memory on my system, > so I try to log in using the serial console, but since the serial driver > does get_free_page (this most likely fails) and the driver complains back. > So, I had suggested a while back that important subsystems should maintain > their own pool (it will take a new thread to discuss the right size of > each pool). > > Why can't Linux follow the same approach? especially on systems with a lot > of memory. There is nothing which avoids us from doing that (there is one reserved pool I remeber right now: the highmem bounce buffering pool, but that one is a special case due to the way Linux does IO in high memory and its only needed on _real_ emergencies --- it will be removed in 2.5, I hope). In general, its a better approach to share the memory and have a unified pool. If a given subsystem is not using its own "reversed" memory, another subsystems can use it. The problem we are seeing now can be fixed even without the reserved pools. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: pre6 VM issues 2001-10-09 13:01 ` Marcelo Tosatti @ 2001-10-09 14:37 ` BALBIR SINGH 2001-10-09 13:22 ` Marcelo Tosatti ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: BALBIR SINGH @ 2001-10-09 14:37 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Linus Torvalds, Andrea Arcangeli, lkml [-- Attachment #1: Type: text/plain, Size: 2003 bytes --] Marcelo Tosatti wrote: > >On Tue, 9 Oct 2001, BALBIR SINGH wrote: > >>Most of the traditional unices maintained a pool for each subsystem >>(this is really useful when u have the memory to spare), so not matter >>what they use memory only from their pool (and if needed peek outside), >>but nobody else used the memory from the pool. >> >>I have seen cases where, I have run out of physical memory on my system, >>so I try to log in using the serial console, but since the serial driver >>does get_free_page (this most likely fails) and the driver complains back. >>So, I had suggested a while back that important subsystems should maintain >>their own pool (it will take a new thread to discuss the right size of >>each pool). >> >>Why can't Linux follow the same approach? especially on systems with a lot >>of memory. >> > >There is nothing which avoids us from doing that (there is one reserved >pool I remeber right now: the highmem bounce buffering pool, but that one >is a special case due to the way Linux does IO in high memory and its only >needed on _real_ emergencies --- it will be removed in 2.5, I hope). > >In general, its a better approach to share the memory and have a unified >pool. If a given subsystem is not using its own "reversed" memory, another >subsystems can use it. > >The problem we are seeing now can be fixed even without the reserved >pools. > I agree that is the fair and nice thing to do, but I was talking about reserving memory for device vs sharing it with a user process, user processes can wait, their pages can even be swapped out if needed. But for a device that is not willing to wait (GFP_ATOMIC) say in an interrupt context, this might be a issue. Anyway, how do you plan to solve this ? Balbir > > >- >To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html >Please read the FAQ at http://www.tux.org/lkml/ > [-- Attachment #2: Wipro_Disclaimer.txt --] [-- Type: text/plain, Size: 853 bytes --] ---------------------------------------------------------------------------------------------------------------------- Information transmitted by this E-MAIL is proprietary to Wipro and/or its Customers and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwarded to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly prohibited. In such cases, please notify us immediately at mailto:mailadmin@wipro.com and delete this mail from your records. ---------------------------------------------------------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: pre6 VM issues 2001-10-09 14:37 ` BALBIR SINGH @ 2001-10-09 13:22 ` Marcelo Tosatti 2001-10-09 14:43 ` BALBIR SINGH 2001-10-09 14:44 ` Andrea Arcangeli 2 siblings, 0 replies; 20+ messages in thread From: Marcelo Tosatti @ 2001-10-09 13:22 UTC (permalink / raw) To: BALBIR SINGH; +Cc: Linus Torvalds, Andrea Arcangeli, lkml, bcrl On Tue, 9 Oct 2001, BALBIR SINGH wrote: > Marcelo Tosatti wrote: > > > > >On Tue, 9 Oct 2001, BALBIR SINGH wrote: > > > >>Most of the traditional unices maintained a pool for each subsystem > >>(this is really useful when u have the memory to spare), so not matter > >>what they use memory only from their pool (and if needed peek outside), > >>but nobody else used the memory from the pool. > >> > >>I have seen cases where, I have run out of physical memory on my system, > >>so I try to log in using the serial console, but since the serial driver > >>does get_free_page (this most likely fails) and the driver complains back. > >>So, I had suggested a while back that important subsystems should maintain > >>their own pool (it will take a new thread to discuss the right size of > >>each pool). > >> > >>Why can't Linux follow the same approach? especially on systems with a lot > >>of memory. > >> > > > >There is nothing which avoids us from doing that (there is one reserved > >pool I remeber right now: the highmem bounce buffering pool, but that one > >is a special case due to the way Linux does IO in high memory and its only > >needed on _real_ emergencies --- it will be removed in 2.5, I hope). > > > >In general, its a better approach to share the memory and have a unified > >pool. If a given subsystem is not using its own "reversed" memory, another > >subsystems can use it. > > > >The problem we are seeing now can be fixed even without the reserved > >pools. > > > I agree that is the fair and nice thing to do, but I was talking about reserving > memory for device vs sharing it with a user process, user processes can wait, > their pages can even be swapped out if needed. But for a device that is not willing > to wait (GFP_ATOMIC) say in an interrupt context, this might be a issue. > > > Anyway, how do you plan to solve this ? I plan to have saner limits for atomic allocations for 2.4. For the corner cases, we can make then those limits tunable. For 2.5, I guess we'll need some scheme for those corner cases, since they will probably become more common (think about gigabit ethernet, etc). I'm not sure yet which one will be used. Ben (bcrl@redhat.com) has a nice scheme ready for reservation. But thats 2.5 only anyway. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: pre6 VM issues 2001-10-09 14:37 ` BALBIR SINGH 2001-10-09 13:22 ` Marcelo Tosatti @ 2001-10-09 14:43 ` BALBIR SINGH 2001-10-09 14:44 ` Andrea Arcangeli 2 siblings, 0 replies; 20+ messages in thread From: BALBIR SINGH @ 2001-10-09 14:43 UTC (permalink / raw) To: BALBIR SINGH; +Cc: Marcelo Tosatti, Linus Torvalds, Andrea Arcangeli, lkml [-- Attachment #1: Type: text/plain, Size: 2358 bytes --] BALBIR SINGH wrote: >> >> There is nothing which avoids us from doing that (there is one reserved >> pool I remeber right now: the highmem bounce buffering pool, but that >> one >> is a special case due to the way Linux does IO in high memory and its >> only >> needed on _real_ emergencies --- it will be removed in 2.5, I hope). >> >> In general, its a better approach to share the memory and have a unified >> pool. If a given subsystem is not using its own "reversed" memory, >> another >> subsystems can use it. >> >> The problem we are seeing now can be fixed even without the reserved >> pools. >> > I agree that is the fair and nice thing to do, but I was talking about > reserving > memory for device vs sharing it with a user process, user processes > can wait, > their pages can even be swapped out if needed. But for a device that > is not willing > to wait (GFP_ATOMIC) say in an interrupt context, this might be a issue. > > > Anyway, how do you plan to solve this ? > Balbir I did not realize that highmem was causing this problem you were facing, anyway my argument about the pools still holds. Balbir > >> >> >> - >> To unsubscribe from this list: send the line "unsubscribe >> linux-kernel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ >> > > > > >------------------------------------------------------------------------ > >---------------------------------------------------------------------------------------------------------------------- >Information transmitted by this E-MAIL is proprietary to Wipro and/or its Customers and >is intended for use only by the individual or entity to which it is >addressed, and may contain information that is privileged, confidential or >exempt from disclosure under applicable law. If you are not the intended >recipient or it appears that this mail has been forwarded to you without >proper authority, you are notified that any use or dissemination of this >information in any manner is strictly prohibited. In such cases, please >notify us immediately at mailto:mailadmin@wipro.com and delete this mail >from your records. >---------------------------------------------------------------------------------------------------------------------- > [-- Attachment #2: Wipro_Disclaimer.txt --] [-- Type: text/plain, Size: 853 bytes --] ---------------------------------------------------------------------------------------------------------------------- Information transmitted by this E-MAIL is proprietary to Wipro and/or its Customers and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwarded to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly prohibited. In such cases, please notify us immediately at mailto:mailadmin@wipro.com and delete this mail from your records. ---------------------------------------------------------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: pre6 VM issues 2001-10-09 14:37 ` BALBIR SINGH 2001-10-09 13:22 ` Marcelo Tosatti 2001-10-09 14:43 ` BALBIR SINGH @ 2001-10-09 14:44 ` Andrea Arcangeli 2001-10-09 14:56 ` BALBIR SINGH 2 siblings, 1 reply; 20+ messages in thread From: Andrea Arcangeli @ 2001-10-09 14:44 UTC (permalink / raw) To: BALBIR SINGH; +Cc: Marcelo Tosatti, Linus Torvalds, lkml On Tue, Oct 09, 2001 at 08:07:19PM +0530, BALBIR SINGH wrote: > their pages can even be swapped out if needed. But for a device that is not willing > to wait (GFP_ATOMIC) say in an interrupt context, this might be a issue. There's just a reserved pool for atomic allocations. See the __GFP_WAIT check in __alloc_pages. Andrea ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: pre6 VM issues 2001-10-09 14:44 ` Andrea Arcangeli @ 2001-10-09 14:56 ` BALBIR SINGH 0 siblings, 0 replies; 20+ messages in thread From: BALBIR SINGH @ 2001-10-09 14:56 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Marcelo Tosatti, Linus Torvalds, lkml [-- Attachment #1: Type: text/plain, Size: 416 bytes --] Andrea Arcangeli wrote: >On Tue, Oct 09, 2001 at 08:07:19PM +0530, BALBIR SINGH wrote: > >>their pages can even be swapped out if needed. But for a device that is not willing >>to wait (GFP_ATOMIC) say in an interrupt context, this might be a issue. >> > >There's just a reserved pool for atomic allocations. See the __GFP_WAIT >check in __alloc_pages. > I apologize for my ignorance on this Balbir > >Andrea > [-- Attachment #2: Wipro_Disclaimer.txt --] [-- Type: text/plain, Size: 853 bytes --] ---------------------------------------------------------------------------------------------------------------------- Information transmitted by this E-MAIL is proprietary to Wipro and/or its Customers and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwarded to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly prohibited. In such cases, please notify us immediately at mailto:mailadmin@wipro.com and delete this mail from your records. ---------------------------------------------------------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: pre6 VM issues 2001-10-09 12:44 pre6 VM issues Marcelo Tosatti 2001-10-09 12:48 ` Marcelo Tosatti 2001-10-09 14:17 ` BALBIR SINGH @ 2001-10-09 14:31 ` Andrea Arcangeli 2001-10-09 13:13 ` Marcelo Tosatti 2001-10-09 13:23 ` Marcelo Tosatti 2001-10-09 14:50 ` Andrea Arcangeli 3 siblings, 2 replies; 20+ messages in thread From: Andrea Arcangeli @ 2001-10-09 14:31 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Linus Torvalds, lkml On Tue, Oct 09, 2001 at 10:44:37AM -0200, Marcelo Tosatti wrote: > > Hi, > > I've been testing pre6 (actually its pre5 a patch which Linus sent me > named "prewith 16GB of RAM (thanks to OSDLabs for that), and I've found > out some problems. First of all, we need to throttle normal allocators > more often and/or update the low memory limits for normal allocators to a > saner value. I already said I think allowing everybody to eat up to > "freepages.min" is too low for a default. > > I've got atomic memory failures with _22GB_ of swap free (32GB total): > > eth0: can't fill rx buffer (force 0)! > > Another issue is the damn fork() special case. Its failing in practice: > > bash: fork: Cannot allocate memory > > Also with _LOTS_ of swap free. (gigs of them) > > Linus, we can introduce a "__GFP_FAIL" flag to be used by _everyone_ which > wants to do higher order allocations as an optimization (eg allocate big > scatter-gather tables or whatever). Or do you prefer to make the fork() > allocation a separate case ? > > I'll take a closer look at the code now and make the throttling/limits to > what I think is saner for a default. I've also finished last night to fix all highmem troubles that I could reproduce on 128mbyte with highmem emulation, I'm confidetn it will work fine on real highmem too now, I hope to get access soon to some highmem machine too to test it. I guess you're not interested to test my patches since they're not in the mainline direction though. Andrea ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: pre6 VM issues 2001-10-09 14:31 ` Andrea Arcangeli @ 2001-10-09 13:13 ` Marcelo Tosatti 2001-10-09 14:42 ` Andrea Arcangeli 2001-10-09 13:23 ` Marcelo Tosatti 1 sibling, 1 reply; 20+ messages in thread From: Marcelo Tosatti @ 2001-10-09 13:13 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Linus Torvalds, lkml On Tue, 9 Oct 2001, Andrea Arcangeli wrote: > I guess you're not interested to test my patches since they're not in > the mainline direction though. Why they are not in the mainline direction ? Are they hackish ? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: pre6 VM issues 2001-10-09 13:13 ` Marcelo Tosatti @ 2001-10-09 14:42 ` Andrea Arcangeli 0 siblings, 0 replies; 20+ messages in thread From: Andrea Arcangeli @ 2001-10-09 14:42 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Linus Torvalds, lkml On Tue, Oct 09, 2001 at 11:13:07AM -0200, Marcelo Tosatti wrote: > > > On Tue, 9 Oct 2001, Andrea Arcangeli wrote: > > > I guess you're not interested to test my patches since they're not in > > the mainline direction though. > > Why they are not in the mainline direction ? > > Are they hackish ? IMHO the other way around, first of all I'm not using the infinite loop, and I dropped a few bits ready for doing a few different things in the next days like selecting the process to kill in function of the allocation rate and by collecting away exclusive pages in get_swap_page etc... I'll release the stuff soon as usual in separate patches easily readable and mergeable, because as said I cannot find anything wrong anymore in the allocator with my testing resources. Of course I'd really like if you could test it on the 16G box, but as said it won't test the approch to the allocator faliure fixes that is been implemented in mainline which I understood you're working on at the moment. Andrea ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: pre6 VM issues 2001-10-09 14:31 ` Andrea Arcangeli 2001-10-09 13:13 ` Marcelo Tosatti @ 2001-10-09 13:23 ` Marcelo Tosatti 2001-10-09 14:53 ` Andrea Arcangeli 1 sibling, 1 reply; 20+ messages in thread From: Marcelo Tosatti @ 2001-10-09 13:23 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Linus Torvalds, lkml On Tue, 9 Oct 2001, Andrea Arcangeli wrote: > On Tue, Oct 09, 2001 at 10:44:37AM -0200, Marcelo Tosatti wrote: > > > > Hi, > > > > I've been testing pre6 (actually its pre5 a patch which Linus sent me > > named "prewith 16GB of RAM (thanks to OSDLabs for that), and I've found > > out some problems. First of all, we need to throttle normal allocators > > more often and/or update the low memory limits for normal allocators to a > > saner value. I already said I think allowing everybody to eat up to > > "freepages.min" is too low for a default. > > > > I've got atomic memory failures with _22GB_ of swap free (32GB total): > > > > eth0: can't fill rx buffer (force 0)! > > > > Another issue is the damn fork() special case. Its failing in practice: > > > > bash: fork: Cannot allocate memory > > > > Also with _LOTS_ of swap free. (gigs of them) > > > > Linus, we can introduce a "__GFP_FAIL" flag to be used by _everyone_ which > > wants to do higher order allocations as an optimization (eg allocate big > > scatter-gather tables or whatever). Or do you prefer to make the fork() > > allocation a separate case ? > > > > I'll take a closer look at the code now and make the throttling/limits to > > what I think is saner for a default. > > I've also finished last night to fix all highmem troubles that I could > reproduce on 128mbyte with highmem emulation, I'm confidetn it will work > fine on real highmem too now, I hope to get access soon to some highmem > machine too to test it. > > I guess you're not interested to test my patches since they're not in > the mainline direction though. Ah, I forgot something: Even if I'm not interested in the patches the 16GB machine is available to the community. If you (or any other VM people who need the machine) want access, just tell me. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: pre6 VM issues 2001-10-09 13:23 ` Marcelo Tosatti @ 2001-10-09 14:53 ` Andrea Arcangeli 0 siblings, 0 replies; 20+ messages in thread From: Andrea Arcangeli @ 2001-10-09 14:53 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Linus Torvalds, lkml On Tue, Oct 09, 2001 at 11:23:24AM -0200, Marcelo Tosatti wrote: > machine is available to the community. If you (or any other VM people who > need the machine) want access, just tell me. I'd like to get a login. I think my project is been approved and we'll get soon an additional machine to test (that doesn't hurt), but in the meantime I'd be just interested to run some test on real highmem of course. Andrea ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: pre6 VM issues 2001-10-09 12:44 pre6 VM issues Marcelo Tosatti ` (2 preceding siblings ...) 2001-10-09 14:31 ` Andrea Arcangeli @ 2001-10-09 14:50 ` Andrea Arcangeli 2001-10-09 13:34 ` Marcelo Tosatti 3 siblings, 1 reply; 20+ messages in thread From: Andrea Arcangeli @ 2001-10-09 14:50 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Linus Torvalds, lkml On Tue, Oct 09, 2001 at 10:44:37AM -0200, Marcelo Tosatti wrote: > > Hi, > > I've been testing pre6 (actually its pre5 a patch which Linus sent me > named "prewith 16GB of RAM (thanks to OSDLabs for that), and I've found > out some problems. First of all, we need to throttle normal allocators > more often and/or update the low memory limits for normal allocators to a > saner value. I already said I think allowing everybody to eat up to > "freepages.min" is too low for a default. > > I've got atomic memory failures with _22GB_ of swap free (32GB total): > > eth0: can't fill rx buffer (force 0)! > > Another issue is the damn fork() special case. Its failing in practice: > > bash: fork: Cannot allocate memory > > Also with _LOTS_ of swap free. (gigs of them) It could be just fragmentation but the fact it doesn't happen in non-highmem pretty much shows that shows the memory balancing isn't doing the right thing, you hide the problem with the infinite loop for non atomic order 0 allocations and that's just broken, as best it will be slower in collecting the right pages away. My approch shouldn't fail so easily in fork despite I'm not looping in fork either, because I'm trying to do better decisions since the first place in the memory balancing, I don't wait the infinite loop to eventually collect away the right pages. Andrea ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: pre6 VM issues 2001-10-09 14:50 ` Andrea Arcangeli @ 2001-10-09 13:34 ` Marcelo Tosatti 2001-10-09 15:39 ` Andrea Arcangeli 0 siblings, 1 reply; 20+ messages in thread From: Marcelo Tosatti @ 2001-10-09 13:34 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Linus Torvalds, lkml On Tue, 9 Oct 2001, Andrea Arcangeli wrote: > On Tue, Oct 09, 2001 at 10:44:37AM -0200, Marcelo Tosatti wrote: > > > > Hi, > > > > I've been testing pre6 (actually its pre5 a patch which Linus sent me > > named "prewith 16GB of RAM (thanks to OSDLabs for that), and I've found > > out some problems. First of all, we need to throttle normal allocators > > more often and/or update the low memory limits for normal allocators to a > > saner value. I already said I think allowing everybody to eat up to > > "freepages.min" is too low for a default. > > > > I've got atomic memory failures with _22GB_ of swap free (32GB total): > > > > eth0: can't fill rx buffer (force 0)! > > > > Another issue is the damn fork() special case. Its failing in practice: > > > > bash: fork: Cannot allocate memory > > > > Also with _LOTS_ of swap free. (gigs of them) > > It could be just fragmentation but the fact it doesn't happen in > non-highmem pretty much shows that shows the memory balancing isn't > doing the right thing, you hide the problem with the infinite loop for > non atomic order 0 allocations and that's just broken, as best it will > be slower in collecting the right pages away. > > My approch shouldn't fail so easily in fork despite I'm not looping in > fork either, because I'm trying to do better decisions since the first > place in the memory balancing, I don't wait the infinite loop to > eventually collect away the right pages. The problem may well be in the memory balancing Andrea, but I'm not trying to hide it with the infinite loop. The infinite loop is just a guarantee that we'll have a reliable way of throttling the allocators which can block. Not doing the infinite loop is just way too fragile IMO and it is _prone_ to fail in intensive loads. If the problem is the highmem balancing, I'll love to get your fixes and integrate with the infinite loop logic, which is a separated (related, yes, but separate) thing. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: pre6 VM issues 2001-10-09 13:34 ` Marcelo Tosatti @ 2001-10-09 15:39 ` Andrea Arcangeli 2001-10-09 15:08 ` Marcelo Tosatti 0 siblings, 1 reply; 20+ messages in thread From: Andrea Arcangeli @ 2001-10-09 15:39 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Linus Torvalds, lkml On Tue, Oct 09, 2001 at 11:34:47AM -0200, Marcelo Tosatti wrote: > The problem may well be in the memory balancing Andrea, but I'm not trying > to hide it with the infinite loop. I assumed fixing the oom faliures with highmem was the main reason of the infinite loop. > The infinite loop is just a guarantee that we'll have a reliable way of > throttling the allocators which can block. Not doing the infinite loop is Throttling have nothing to do with the infinite loop. > just way too fragile IMO and it is _prone_ to fail in intensive > loads. It is too fragile if the vm is doing the wrong actions and so we must loop over and over again before it finally does the right thing. If allocation fails that's a nice feedback that tell us "the memory balancing is at least inefficient in doing the right thing, looping would only waste more cache and more time for the allocation". Think a list where pages can be only freeable or unfreeable. Now scan _all_ the pages and free all the freeable ones. Finished. If it failed and it couldn't free anything it means there was nothing to free so we're oom. How can that be "fragile"? In real life it isn't as simple as that, there's some "race" effect caming from the schedules in between, there are multiple lists, there's swapout etc... so it's a little more complex than just "freeable" and "unfreeable" and a single list, but it can be done, 2.2 does that too, if we loop over and over again and we do no progress in the right direction I prefer to know about that via an allocation faliure rather than by just getting sucking performance. Also an allocation faliure is a minor problem compared to a deadlock that the infinite loop cannot prevent. > If the problem is the highmem balancing, I'll love to get your fixes and > integrate with the infinite loop logic, which is a separated (related, > yes, but separate) thing. The infinite loop shouldn't do anything except introducing the deadlock after that (otherwise it means I failed :), but you're free to go in your direction if you think it's the right one of course (like I'm free to go in my direction since I think it's the right one). Andrea ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: pre6 VM issues 2001-10-09 15:39 ` Andrea Arcangeli @ 2001-10-09 15:08 ` Marcelo Tosatti 2001-10-09 16:49 ` Andrea Arcangeli 0 siblings, 1 reply; 20+ messages in thread From: Marcelo Tosatti @ 2001-10-09 15:08 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Linus Torvalds, lkml On Tue, 9 Oct 2001, Andrea Arcangeli wrote: > On Tue, Oct 09, 2001 at 11:34:47AM -0200, Marcelo Tosatti wrote: > > The problem may well be in the memory balancing Andrea, but I'm not trying > > to hide it with the infinite loop. > > I assumed fixing the oom faliures with highmem was the main reason of > the infinite loop. > > > The infinite loop is just a guarantee that we'll have a reliable way of > > throttling the allocators which can block. Not doing the infinite loop is > > Throttling have nothing to do with the infinite loop. Sorry but the infinite loop does throttles page reclamation until there is enough memory for the process allocating memory to go on. > > just way too fragile IMO and it is _prone_ to fail in intensive > > loads. > > It is too fragile if the vm is doing the wrong actions and so we must > loop over and over again before it finally does the right thing. > > If allocation fails that's a nice feedback that tell us "the memory > balancing is at least inefficient in doing the right thing, looping > would only waste more cache and more time for the allocation". > > Think a list where pages can be only freeable or unfreeable. Now scan > _all_ the pages and free all the freeable ones. Finished. If it failed > and it couldn't free anything it means there was nothing to free so > we're oom. How can that be "fragile"? That is fragile IMHO, Andrea. The infinite loop is simple, reliable logic which shows to works (as long as the OOM killer is working correctly). > In real life it isn't as simple as that, there's some "race" effect > caming from the schedules in between, there are multiple lists, there's > swapout etc... so it's a little more complex than just "freeable" and > "unfreeable" and a single list, but it can be done, 2.2 does that too, > if we loop over and over again and we do no progress in the right > direction I prefer to know about that via an allocation faliure rather > than by just getting sucking performance. Also an allocation faliure is > a minor problem compared to a deadlock that the infinite loop cannot > prevent. If the OOM killer is doing its job correctly, a deadlock will not happen. > > If the problem is the highmem balancing, I'll love to get your fixes and > > integrate with the infinite loop logic, which is a separated (related, > > yes, but separate) thing. > > The infinite loop shouldn't do anything except introducing the deadlock > after that (otherwise it means I failed :), but you're free to go in > your direction if you think it's the right one of course (like I'm free > to go in my direction since I think it's the right one). Sure. :) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: pre6 VM issues 2001-10-09 15:08 ` Marcelo Tosatti @ 2001-10-09 16:49 ` Andrea Arcangeli 2001-10-09 17:07 ` Linus Torvalds 0 siblings, 1 reply; 20+ messages in thread From: Andrea Arcangeli @ 2001-10-09 16:49 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Linus Torvalds, lkml On Tue, Oct 09, 2001 at 01:08:14PM -0200, Marcelo Tosatti wrote: > Sorry but the infinite loop does throttles page reclamation until there > is enough memory for the process allocating memory to go on. If you think we're missing throttling and you add the infinite loop, yes, you'll hide the lack of throttling by looping at full cpu speed rather than using the cpu for more useful things, but that doesn't mean that the looping in itself is adding throttling, a loop can't add throttling. > > > just way too fragile IMO and it is _prone_ to fail in intensive > > > loads. > > > > It is too fragile if the vm is doing the wrong actions and so we must > > loop over and over again before it finally does the right thing. > > > > If allocation fails that's a nice feedback that tell us "the memory > > balancing is at least inefficient in doing the right thing, looping > > would only waste more cache and more time for the allocation". > > > > Think a list where pages can be only freeable or unfreeable. Now scan > > _all_ the pages and free all the freeable ones. Finished. If it failed > > and it couldn't free anything it means there was nothing to free so > > we're oom. How can that be "fragile"? > > That is fragile IMHO, Andrea. Mind to explain "why"? Of course you can't because it isn't fragile, period. If you have a list and the elements that can be freeable or unfreeable, and you scan the whole with all the locks held and you free everything freeable that you find in your way, you know that if you didn't free anything after the scan completed, it means you're oom. As said real world is more complex, but the example above is really obvious. > The infinite loop is simple, reliable logic which shows to works (as long > as the OOM killer is working correctly). The infinite loop adds oom deadlocks and hides the real problems in the memory balancing. > If the OOM killer is doing its job correctly, a deadlock will not happen. I quote my first email about pre4 (I think I CC'ed you too): ".. think if the oom-selected task is looping trying to free memory, it won't care about the signal you sent to it .." and that was just a simple case, there are more problems, the above one can be esaily fixed with a simple check for signal pending within the loop, that is currently still missing and that you seems not to care to add even after I mentioned this exact problem as soon as pre4 is been released (that I didn't fixed because I'm not using the loop and because there would be other problems and I don't need the loop just to detect oom). I think it's useless to keep discussing this, not matter what I say and the problem I'm raising, you will keep thinking the loop is the right way as far I can see. Andrea ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: pre6 VM issues 2001-10-09 16:49 ` Andrea Arcangeli @ 2001-10-09 17:07 ` Linus Torvalds 0 siblings, 0 replies; 20+ messages in thread From: Linus Torvalds @ 2001-10-09 17:07 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Marcelo Tosatti, lkml On Tue, 9 Oct 2001, Andrea Arcangeli wrote: > > If you think we're missing throttling and you add the infinite loop, > yes, you'll hide the lack of throttling by looping at full cpu speed > rather than using the cpu for more useful things, but that doesn't mean > that the looping in itself is adding throttling, a loop can't add > throttling. The loop means that the return value of "try_to_free_pages()" basically becomes meaningless _except_ as a way of telling kswapd that "we are now having trouble freeing pages, maybe you should check if something should be killed". Which means that "try_to_free_pages()" has more freedom in doing whatever it is it wants to do - it knows that real allocations will call it again (after having checked whether the process can die). > > > Think a list where pages can be only freeable or unfreeable. Now scan > > > _all_ the pages and free all the freeable ones. Finished. If it failed > > > and it couldn't free anything it means there was nothing to free so > > > we're oom. How can that be "fragile"? > > > > That is fragile IMHO, Andrea. > > Mind to explain "why"? Of course you can't because it isn't fragile, > period. It's fragile because it means that to be true, the try_to_free_pages() logic _has_to_guarantee_ that it looked at every single page. Going through every list that ages _twice_ to get rid of potential accessed bits. For example, it means that if there are lots of pages that just happen to be locked due to having pending write-outs on them, you will return OOM. Even if the system isn't out of memory - it's only temporarily locked, and what try_to_free_pages() should have done is probably to wait on a page. HOWEVER, you cannot afford to wait on a single page with your approach, because if you wait for pages that you notice are locked, _together_ with the requirement that you have to go through every single list twice, you'd be totally screwed, and people might wait for a really long time. So what do you do? You never wait at all, and just skip locked pages. Which means that your loop can never throttle, and because you refuse to see the light about the "endless loop", you can never really even _start_ throttling on IO without adding more and more special cases. > The infinite loop adds oom deadlocks and hides the real problems in the > memory balancing. You've not shown that to be true. Look at the code, tell us how it deadlocks. > I quote my first email about pre4 (I think I CC'ed you too): > > ".. think if the oom-selected task is looping trying to free memory, it > won't care about the signal you sent to it .." Look again, and read the emails we've sent you. You refuse to listen, and that's the problem. Check the PF_MEMALLOC logic, and stop blathering about things that you do not understand. Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2001-10-09 17:08 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-10-09 12:44 pre6 VM issues Marcelo Tosatti 2001-10-09 12:48 ` Marcelo Tosatti 2001-10-09 14:17 ` BALBIR SINGH 2001-10-09 13:01 ` Marcelo Tosatti 2001-10-09 14:37 ` BALBIR SINGH 2001-10-09 13:22 ` Marcelo Tosatti 2001-10-09 14:43 ` BALBIR SINGH 2001-10-09 14:44 ` Andrea Arcangeli 2001-10-09 14:56 ` BALBIR SINGH 2001-10-09 14:31 ` Andrea Arcangeli 2001-10-09 13:13 ` Marcelo Tosatti 2001-10-09 14:42 ` Andrea Arcangeli 2001-10-09 13:23 ` Marcelo Tosatti 2001-10-09 14:53 ` Andrea Arcangeli 2001-10-09 14:50 ` Andrea Arcangeli 2001-10-09 13:34 ` Marcelo Tosatti 2001-10-09 15:39 ` Andrea Arcangeli 2001-10-09 15:08 ` Marcelo Tosatti 2001-10-09 16:49 ` Andrea Arcangeli 2001-10-09 17:07 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox