From mboxrd@z Thu Jan 1 00:00:00 1970 From: Neil Horman Subject: Re: [PATCH] add free hugepage function Date: Wed, 29 Oct 2014 10:27:45 -0400 Message-ID: <20141029142745.GA14253@localhost.localdomain> References: <1414551269-5820-1-git-send-email-haifeng.lin@huawei.com> <533710CFB86FA344BFBF2D6802E60286C7CAAB@SHSMSX101.ccr.corp.intel.com> <20141029034437.GA29486@mhcomputing.net> <533710CFB86FA344BFBF2D6802E60286C7CB42@SHSMSX101.ccr.corp.intel.com> <54508DE1.9090908@huawei.com> <20141029102635.GB8292@BRICHA3-MOBL> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: "dev-VfR2kkLFssw@public.gmane.org" To: Bruce Richardson Return-path: Content-Disposition: inline In-Reply-To: <20141029102635.GB8292@BRICHA3-MOBL> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces-VfR2kkLFssw@public.gmane.org Sender: "dev" On Wed, Oct 29, 2014 at 10:26:35AM +0000, Bruce Richardson wrote: > On Wed, Oct 29, 2014 at 02:49:05PM +0800, Linhaifeng wrote: > >=20 > >=20 > > On 2014/10/29 13:26, Qiu, Michael wrote: > > > =E5=9C=A8 10/29/2014 11:46 AM, Matthew Hall =E5=86=99=E9=81=93: > > >> On Wed, Oct 29, 2014 at 03:27:58AM +0000, Qiu, Michael wrote: > > >>> I just saw one return path with value '0', and no any other place= =20 > > >>> return a negative value, so it is better to be designed as one > > >>> non-return function, > > >>> > > >>> +void > > >>> +rte_eal_hugepage_free(void) > > >>> +{ > > >>> + struct hugepage_file *hugepg_tbl =3D g_hugepage_table.hugepg_tb= l; > > >>> + unsigned i; > > >>> + unsigned nr_hugefiles =3D g_hugepage_table.nr_hugefiles; > > >>> + > > >>> + RTE_LOG(INFO, EAL, "unlink %u hugepage files\n", nr_hugefiles); > > >>> + > > >>> + for (i =3D 0; i < nr_hugefiles; i++) { > > >>> + unlink(hugepg_tbl[i].filepath); > > >>> + hugepg_tbl[i].orig_va =3D NULL; > > >>> + } > > >>> +} > > >>> + > > >>> > > >>> Thanks, > > >>> Michael > > >> Actually, I don't think that's quite right. > > >> > > >> http://linux.die.net/man/2/unlink > > >> > > >> "On success, zero is returned. On error, -1 is returned, and errno= is set=20 > > >> appropriately." So it should be returning an error, and logging a = message for=20 > > >> a file it cannot unlink or people will be surprised with weird fai= lures. > > >=20 > > > Really need one message for unlink failed, but I'm afraid that if i= t > > > make sense for return an error code when application exit. > > >=20 > > > Thanks > > > Michael > > >> It also had some minor typos / English in the comments but we can = fix that too. > > >> > > >> Matthew. > > >> > > >=20 > > >=20 > > >=20 > > Agree.May be it is not need to return error? > > --=20 > > Regards, > > Haifeng > >=20 >=20 > May I throw out a few extra ideas. >=20 > The basic problem with DPDK and hugepages on exit, as you have already=20 > diagnosed, is not that the hugepages aren't released for re-use, its th= at=20 > the files inside the hugepage directory aren't unlinked. There are two=20 I agree, this patch seems rather unbalanced. Hugepages are initalized on startup, not allocated explicitly, and so should not be freed on exit. I= nstead, they should be globally refcounted and (potentially per configuration) fr= eed when the last application to exit drops the refcount to zero. > considerations here that prevented us from coming up previously with a=20 > solution to this that we were fully happy with. > 1. There is no automatic way (at least none that I'm aware of), to have= the=20 > kernel automatically delete a file on exit. This means that any scheme = to=20 > delete the files on application termination will have to be a voluntary= one=20 > that won't happen if an app crashed or is forcibly terminated. That is = why=20 > the EAL right now does the clean-up on the restart of the app instead. Thats not true. The common POSIX compliant method is to use the atexit() function to register a function to be called when a process terminates. = It can be used from rte_eth_hugepage_init so that a refcount is increased there = and decreased when the process exits. The atexit registered function can als= o check the refcount for zero and, dependent on configuration, unlink the hugepag= e files. That way the application can be completely unaware of a need to d= o freeing/unlinking > 2. The other consideration is multi-process. In a multi-process environ= ment,=20 > it is unclear when exactly an app is finished - just because one proces= s=20 > terminates does not mean that the whole app is finished with the hugepa= ge=20 > files. If the files are deleted before a DPDK secondary process starts = up,=20 > it won't be able to start as it won't be able to mmap the hugepage memo= ry it=20 > needs. >=20 Refcounting fixes that. If every process takes a reference count on the hugepage set, then you unlink it when the refcount hits zero. > Now, that being said, I think we could see about having automatic hugep= age=20 > deletion controlled by an EAL parameter. That way, the parameter could = be=20 > omitted in the multi-process case, but could be used for those who want= to=20 > have a clean hugepage dir after app termination. >=20 > What I think we could do is, instead of having the app save the list of= =20 > hugepages it uses as the app initializes, is to have the app delete the= =20 > hugepage files as soon as it closes the filehandle for them. If my=20 > understanding is correct, the kernel will actually keep the hugepage me= mory=20 > around in the app as usual from that point onwards, and will only relea= se it=20 > back [automatically] on app termination. New processes won't be able to= mmap=20 > the file, but the running process will be unaffected. The best thing ab= out=20 > this approach is that the hugepage file entries will be deleted whether= or=20 > not the application terminates normally or crashes. >=20 That doesn't really work in a use case in which processes are created and removed (i.e. if you have two processes, one exits, then another one is f= orked, that third process won't find the hugepage file, because the second proce= ss will have unlinked it on exit). > So, any thoughts? Would such a scheme work? I haven't prototyped it, ye= t,=20 > but I think it should work. >=20 > Regards, > /Bruce >=20 > PS: As well as multi-process not being able to use this scheme, I'd als= o=20 > note that this would prevent the dump_cfg utility app - or any other DP= DK=20 > analysis app like it - from being used to inspect the memory area of th= e=20 > running process. That is another reason why I think the flag should not= be=20 > set by default. >=20 >=20