From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bruce Richardson Subject: Re: [PATCH] add free hugepage function Date: Wed, 29 Oct 2014 16:47:18 +0000 Message-ID: <20141029164717.GC7084@bricha3-MOBL3> References: <1414551269-5820-1-git-send-email-haifeng.lin@huawei.com> <533710CFB86FA344BFBF2D6802E60286C7CAAB@SHSMSX101.ccr.corp.intel.com> <20141029034437.GA29486@mhcomputing.net> <533710CFB86FA344BFBF2D6802E60286C7CB42@SHSMSX101.ccr.corp.intel.com> <54508DE1.9090908@huawei.com> <20141029102635.GB8292@BRICHA3-MOBL> <20141029142745.GA14253@localhost.localdomain> <682698A055A0F44AA47192B20141149711B6074C@BGSMSX102.gar.corp.intel.com> <20141029153212.GB14253@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: "dev-VfR2kkLFssw@public.gmane.org" To: Neil Horman Return-path: Content-Disposition: inline In-Reply-To: <20141029153212.GB14253-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces-VfR2kkLFssw@public.gmane.org Sender: "dev" On Wed, Oct 29, 2014 at 11:32:12AM -0400, Neil Horman wrote: > On Wed, Oct 29, 2014 at 03:22:25PM +0000, Ramia, Kannan Babu wrote: > > The problem still remains if one of the process gets abruptly killed,= then the corresponding refcount will not get decremented. There should b= e some scavenging function to be implemented to check the process livelin= ess. =20 > >=20 > Well, abnormal termination results in abnormal consequences. You expec= t > garbage to get left behind of a program crashes, so I wouldn't really w= orry > about that too much. If you really wanted to you can register chained = handlers > for SIGSEGV/SIGBUS/etc to catch those conditions, but honestly, that se= ems like > overkill. If a program that uses shared resources terminates abnormall= y, its > well understood that those shared resources may not get released proper= ly, and > manual intervention is required to clean them up >=20 > Neil If we take this approach we can perhaps reuse a lot of what is already th= ere. The existing code run at startup uses locks as a form of reference counti= ng to determine what hugepages to clean up - safe in the knowledge that a= ny locks are auto-released by the kernel whatever way a process terminate= s. We could also set this code to also run from "atexit()", which means t= hat in a multi-process scenario so long as the final process terminates c= orrectly, the hugepage files will be cleaned up. If any other process ter= minates abruptly, the lock held by that will still be released, and so th= ere will be no problem, and in the worst-case where the final (or perhaps= only) process terminates abnormally, the existing cleanup code will cont= inue to work as it does now by cleaning up at startup. /Bruce >=20 > > regards > > Kannan Babu > >=20 > > -----Original Message----- > > From: dev [mailto:dev-bounces-VfR2kkLFssw@public.gmane.org] On Behalf Of Neil Horman > > Sent: Wednesday, October 29, 2014 7:58 PM > > To: Richardson, Bruce > > Cc: dev-VfR2kkLFssw@public.gmane.org > > Subject: Re: [dpdk-dev] [PATCH] add free hugepage function > >=20 > > On Wed, Oct 29, 2014 at 10:26:35AM +0000, Bruce Richardson wrote: > > > On Wed, Oct 29, 2014 at 02:49:05PM +0800, Linhaifeng wrote: > > > >=20 > > > >=20 > > > > On 2014/10/29 13:26, Qiu, Michael wrote: > > > > > =E5=9C=A8 10/29/2014 11:46 AM, Matthew Hall =E5=86=99=E9=81=93: > > > > >> On Wed, Oct 29, 2014 at 03:27:58AM +0000, Qiu, Michael wrote: > > > > >>> I just saw one return path with value '0', and no any other=20 > > > > >>> place return a negative value, so it is better to be design= ed=20 > > > > >>> as one non-return function, > > > > >>> > > > > >>> +void > > > > >>> +rte_eal_hugepage_free(void) > > > > >>> +{ > > > > >>> + struct hugepage_file *hugepg_tbl =3D g_hugepage_table.hugep= g_tbl; > > > > >>> + unsigned i; > > > > >>> + unsigned nr_hugefiles =3D g_hugepage_table.nr_hugefiles; > > > > >>> + > > > > >>> + RTE_LOG(INFO, EAL, "unlink %u hugepage files\n",=20 > > > > >>> +nr_hugefiles); > > > > >>> + > > > > >>> + for (i =3D 0; i < nr_hugefiles; i++) { > > > > >>> + unlink(hugepg_tbl[i].filepath); > > > > >>> + hugepg_tbl[i].orig_va =3D NULL; > > > > >>> + } > > > > >>> +} > > > > >>> + > > > > >>> > > > > >>> Thanks, > > > > >>> Michael > > > > >> Actually, I don't think that's quite right. > > > > >> > > > > >> http://linux.die.net/man/2/unlink > > > > >> > > > > >> "On success, zero is returned. On error, -1 is returned, and=20 > > > > >> errno is set appropriately." So it should be returning an erro= r,=20 > > > > >> and logging a message for a file it cannot unlink or people wi= ll be surprised with weird failures. > > > > >=20 > > > > > Really need one message for unlink failed, but I'm afraid that = if=20 > > > > > it make sense for return an error code when application exit. > > > > >=20 > > > > > Thanks > > > > > Michael > > > > >> It also had some minor typos / English in the comments but we = can fix that too. > > > > >> > > > > >> Matthew. > > > > >> > > > > >=20 > > > > >=20 > > > > >=20 > > > > Agree.May be it is not need to return error? > > > > -- > > > > Regards, > > > > Haifeng > > > >=20 > > >=20 > > > May I throw out a few extra ideas. > > >=20 > > > The basic problem with DPDK and hugepages on exit, as you have alre= ady=20 > > > diagnosed, is not that the hugepages aren't released for re-use, it= s=20 > > > that the files inside the hugepage directory aren't unlinked. There= =20 > > > are two > > I agree, this patch seems rather unbalanced. Hugepages are initalize= d on startup, not allocated explicitly, and so should not be freed on exi= t. Instead, they should be globally refcounted and (potentially per conf= iguration) freed when the last application to exit drops the refcount to = zero. > >=20 > > > considerations here that prevented us from coming up previously wit= h a=20 > > > solution to this that we were fully happy with. > > > 1. There is no automatic way (at least none that I'm aware of), to=20 > > > have the kernel automatically delete a file on exit. This means tha= t=20 > > > any scheme to delete the files on application termination will have= to=20 > > > be a voluntary one that won't happen if an app crashed or is forcib= ly=20 > > > terminated. That is why the EAL right now does the clean-up on the = restart of the app instead. > > Thats not true. The common POSIX compliant method is to use the atex= it() function to register a function to be called when a process terminat= es. It can be used from rte_eth_hugepage_init so that a refcount is incr= eased there and decreased when the process exits. The atexit registered = function can also check the refcount for zero and, dependent on configura= tion, unlink the hugepage files. That way the application can be complet= ely unaware of a need to do freeing/unlinking > >=20 > > > 2. The other consideration is multi-process. In a multi-process=20 > > > environment, it is unclear when exactly an app is finished - just=20 > > > because one process terminates does not mean that the whole app is=20 > > > finished with the hugepage files. If the files are deleted before a= =20 > > > DPDK secondary process starts up, it won't be able to start as it=20 > > > won't be able to mmap the hugepage memory it needs. > > >=20 > > Refcounting fixes that. If every process takes a reference count on = the hugepage set, then you unlink it when the refcount hits zero. > >=20 > > > Now, that being said, I think we could see about having automatic=20 > > > hugepage deletion controlled by an EAL parameter. That way, the=20 > > > parameter could be omitted in the multi-process case, but could be=20 > > > used for those who want to have a clean hugepage dir after app term= ination. > > >=20 > > > What I think we could do is, instead of having the app save the lis= t=20 > > > of hugepages it uses as the app initializes, is to have the app del= ete=20 > > > the hugepage files as soon as it closes the filehandle for them. If= my=20 > > > understanding is correct, the kernel will actually keep the hugepag= e=20 > > > memory around in the app as usual from that point onwards, and will= =20 > > > only release it back [automatically] on app termination. New proces= ses=20 > > > won't be able to mmap the file, but the running process will be=20 > > > unaffected. The best thing about this approach is that the hugepage= =20 > > > file entries will be deleted whether or not the application termina= tes normally or crashes. > > >=20 > > That doesn't really work in a use case in which processes are created= and removed (i.e. if you have two processes, one exits, then another one= is forked, that third process won't find the hugepage file, because the = second process will have unlinked it on exit). > >=20 > > > So, any thoughts? Would such a scheme work? I haven't prototyped it= ,=20 > > > yet, but I think it should work. > > >=20 > > > Regards, > > > /Bruce > > >=20 > > > PS: As well as multi-process not being able to use this scheme, I'd= =20 > > > also note that this would prevent the dump_cfg utility app - or any= =20 > > > other DPDK analysis app like it - from being used to inspect the=20 > > > memory area of the running process. That is another reason why I th= ink=20 > > > the flag should not be set by default. > > >=20 > > >=20