* Re: I found a synchronization problem in mm/vmalloc.c [not found] ` <002201ca936a$1fc06780$5f413680$@koh@samsung.com> @ 2010-01-13 4:55 ` Nick Piggin [not found] ` <1263388030.2818.6.camel@barrios-desktop> 0 siblings, 1 reply; 2+ messages in thread From: Nick Piggin @ 2010-01-13 4:55 UTC (permalink / raw) To: Yongseok Koh Cc: 'Andrew Morton', gregkh, vegard.nossum, mingo, penberg, paulmck, torvalds, '������', '������', '������', '����ȣ', '������', '��ȿâ', '������', '������', '������', '������', linux-kernel On Tue, Jan 12, 2010 at 06:32:09PM +0900, Yongseok Koh wrote: > Sorry, Mr. Morton. > > Even though it is somewhat late, I am doing cc the mailing list. > > Thanks. > > -----Original Message----- > > On Thu, 7 Jan 2010 20:22:30 +0900 > "Yongseok Koh" <yongseok.koh@samsung.com> wrote: > > > Dear all, > > > > I___m Yongseok Koh in Korea. > > > > Thanks for the report. > > Please do cc a mailing list when reporting bugs so that everyone else knows > what's going on. > > > > > I just got a new message in linux-2.6.28.10 (plz refer to the below) > > > > And, one of my colleagues found that there is a synchronization > > problem in mm/vmalloc.c > > > > > > > > In free_unmap_area_noflush(), va->flags is marked as VM_LAZY_FREE > > first, and then vmap_lazy_nr is increased atomically. > > > > But, in __purge_vmap_area_lazy(), while traversing of vmap_are_list, > > nr is counted by checking VM_LAZY_FREE is set to va->flags. > > > > After counting the variable nr, kernel reads vmap_lazy_nr atomically > > and checks a BUG_ON condition whether nr is greater than vmap_lazy_nr. > > > > > > > > The problem is that, if interrupted right after marking VM_LAZY_FREE, > > increment of vmap_lazy_nr can be delayed. > > > > Consequently, BUG_ON condition can be met because nr is counted more > > than vmap_lazy_nr. > > > > > > > > What I mentioned is highly probable when vmalloc/vfree are called > > frequently. > > > > And my colleagues have verified this scenario by adding delay between > > marking VM_LAZY_FREE and increasing vmap_lazy_nr in > > free_unmap_area_noflush(). > > > > > > > > Am I right ? > > > > Looks plausible to me and as far as I can tell, current code has the same > issue. Yes, I think it's a good catch. > Wakey wakey, Nick! What makes that BUG_ON() safe? Not purge_lock afacit? No I think it is a bug. I would say that we can just get rid of the BUG_ON now. atomic_t is signed, so it should be OK if it momentarily goes negative (and anyway it's only used in a heuristic). So, thanks for the report. Would you care to send a patch, or propose another way to fix the problem? Thanks, Nick ^ permalink raw reply [flat|nested] 2+ messages in thread
[parent not found: <1263388030.2818.6.camel@barrios-desktop>]
[parent not found: <alpine.LFD.2.00.1001130829490.13231@localhost.localdomain>]
[parent not found: <20100114123328.GA7518@laptop>]
[parent not found: <28c262361001150902g569683a1nbd3e0212655a87a0@mail.gmail.com>]
[parent not found: <20100118073759.GB10052@laptop>]
[parent not found: <001c01ca98e2$231d8b10$6958a130$@koh@samsung.com>]
* Re: [PATCH] vmalloc: remove BUG_ON due to racy counting of VM_LAZY_FREE [not found] ` <001c01ca98e2$231d8b10$6958a130$@koh@samsung.com> @ 2010-01-19 12:01 ` Minchan Kim 0 siblings, 0 replies; 2+ messages in thread From: Minchan Kim @ 2010-01-19 12:01 UTC (permalink / raw) To: yongseok.koh Cc: 'Nick Piggin', 'Linus Torvalds', 'Andrew Morton', gregkh, vegard.nossum, 'Ingo Molnar', penberg, paulmck, linux-kernel On Tue, 2010-01-19 at 17:33 +0900, Yongseok Koh wrote: > From: Yongseok Koh <yongseok.koh@samsung.com> You don't need above line. We use "From" when we send patch instead of someone. > > In free_unmap_area_noflush(), va->flags is marked as VM_LAZY_FREE first, and > then vmap_lazy_nr is increased atomically. > But, in __purge_vmap_area_lazy(), while traversing of vmap_are_list, nr is > counted by checking VM_LAZY_FREE is set to va->flags. > After counting the variable nr, kernel reads vmap_lazy_nr atomically and > checks a BUG_ON condition whether nr is greater than vmap_lazy_nr to prevent > vmap_lazy_nr from being negative. > > The problem is that, if interrupted right after marking VM_LAZY_FREE, > increment of vmap_lazy_nr can be delayed. > Consequently, BUG_ON condition can be met because nr is counted more than > vmap_lazy_nr. > > It is highly probable when vmalloc/vfree are called frequently. > This scenario have been verified by adding delay between marking > VM_LAZY_FREE and increasing vmap_lazy_nr in free_unmap_area_noflush(). > > Even the vmap_lazy_nr is for checking high watermark, it never be the strict > watermark. > Although the BUG_ON condition is to prevent vmap_lazy_nr from being > negative, vmap_lazy_nr is signed variable. > So, it could go down to negative value temporarily. > > Consequently, removing the BUG_ON condition is proper. > > A possible BUG_ON message is like the below. > > kernel BUG at mm/vmalloc.c:517! > invalid opcode: 0000 [#1] SMP > EIP: 0060:[<c04824a4>] EFLAGS: 00010297 CPU: 3 > EIP is at __purge_vmap_area_lazy+0x144/0x150 > EAX: ee8a8818 EBX: c08e77d4 ECX: e7c7ae40 EDX: c08e77ec > ESI: 000081fe EDI: e7c7ae60 EBP: e7c7ae64 ESP: e7c7ae3c > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > Call Trace: > [<c0482ad9>] free_unmap_vmap_area_noflush+0x69/0x70 > [<c0482b02>] remove_vm_area+0x22/0x70 > [<c0482c15>] __vunmap+0x45/0xe0 > [<c04831ec>] vmalloc+0x2c/0x30 > Code: 8d 59 e0 eb 04 66 90 89 cb 89 d0 e8 87 fe ff ff 8b 43 20 89 da 8d 48 > e0 8d 43 20 3b 04 24 75 e7 fe 05 a8 a5 a3 c0 e9 78 ff ff ff <0f> 0b eb fe 90 > 8d b4 26 00 00 00 00 56 89 c6 b8 ac a5 a3 c0 31 > EIP: [<c04824a4>] __purge_vmap_area_lazy+0x144/0x150 SS:ESP 0068:e7c7ae3c > > > Signed-off-by: Yongseok Koh <yongseok.koh@samsung.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> We discussed about this following as. http://marc.info/?l=linux-kernel&m=126335856228090&w=2 Thanks for contribution for linux kernel, Yongseok. :) -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2010-01-19 12:01 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <006d01ca8f8b$b62c8ec0$2285ac40$%koh@samsung.com>
[not found] ` <20100111163205.4d013e86.akpm@linux-foundation.org>
[not found] ` <002201ca936a$1fc06780$5f413680$@koh@samsung.com>
2010-01-13 4:55 ` I found a synchronization problem in mm/vmalloc.c Nick Piggin
[not found] ` <1263388030.2818.6.camel@barrios-desktop>
[not found] ` <alpine.LFD.2.00.1001130829490.13231@localhost.localdomain>
[not found] ` <20100114123328.GA7518@laptop>
[not found] ` <28c262361001150902g569683a1nbd3e0212655a87a0@mail.gmail.com>
[not found] ` <20100118073759.GB10052@laptop>
[not found] ` <001c01ca98e2$231d8b10$6958a130$@koh@samsung.com>
2010-01-19 12:01 ` [PATCH] vmalloc: remove BUG_ON due to racy counting of VM_LAZY_FREE Minchan Kim
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox