From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <4E032040.1040906@domain.hid> Date: Thu, 23 Jun 2011 13:15:12 +0200 From: Jan Kiszka MIME-Version: 1.0 References: <4DDA66CF.2010307@domain.hid> <4DFDCBEA.6010905@domain.hid> <4DFDDAB8.4090709@domain.hid> <4DFDF2E1.1060106@domain.hid> <4DFF7E4A.3070807@domain.hid> <4E030956.5030702@domain.hid> <4E031F70.7080304@domain.hid> In-Reply-To: <4E031F70.7080304@domain.hid> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig77D9FFE90578748842004BF4" Sender: jan.kiszka@domain.hid Subject: Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage List-Id: Xenomai life and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Gilles Chanteperdrix Cc: Xenomai core This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig77D9FFE90578748842004BF4 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 2011-06-23 13:11, Gilles Chanteperdrix wrote: > On 06/23/2011 11:37 AM, Jan Kiszka wrote: >> On 2011-06-20 19:07, Jan Kiszka wrote: >>> On 2011-06-19 15:00, Gilles Chanteperdrix wrote: >>>> On 06/19/2011 01:17 PM, Gilles Chanteperdrix wrote: >>>>> On 06/19/2011 12:14 PM, Gilles Chanteperdrix wrote: >>>>>> I am working on this ppd cleanup issue again, I am asking for help= to >>>>>> find a fix in -head for all cases where the sys_ppd is needed duri= ng >>>>>> some cleanup. >>>>>> >>>>>> The problem is that when the ppd cleanup is invoked: >>>>>> - we have no guarantee that current is a thread from the Xenomai >>>>>> application; >>>>>> - if it is, current->mm is NULL. >>>>>> >>>>>> So, associating the sys_ppd to either current or current->mm does = not >>>>>> work. What we could do is pass the sys_ppd to all the other ppds c= leanup >>>>>> handlers, this would fix cases such as freeing mutexes fastlock, b= ut >>>>>> that does not help when the sys_ppd is needed during a thread dele= tion hook. >>>>>> >>>>>> I would like to find a solution where simply calling xnsys_ppd_get= () >>>>>> will work, where we do not have an xnsys_ppd_get for each context,= such >>>>>> as for instance xnsys_ppd_get_by_mm/xnsys_ppd_get_by_task_struct, >>>>>> because it would be too error-prone. >>>>>> >>>>>> Any idea anyone? >>>>> >>>>> The best I could come up with: use a ptd to store the mm currently = >>>>> being cleaned up, so that xnshadow_ppd_get continues to work, even >>>>> in the middle of a cleanup. >>>> >>>> In order to also get xnshadow_ppd_get to work in task deletion hooks= =20 >>>> (which is needed to avoid the issue at the origin of this thread), w= e=20 >>>> also need to set this ptd upon shadow mapping, so it is still there = >>>> when reaching the task deletion hook (where current->mm may be NULL)= =2E=20 >>>> Hence the patch: >>>> >>>> diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c >>>> index b243600..6bc4210 100644 >>>> --- a/ksrc/nucleus/shadow.c >>>> +++ b/ksrc/nucleus/shadow.c >>>> @@ -65,6 +65,11 @@ int nkthrptd; >>>> EXPORT_SYMBOL_GPL(nkthrptd); >>>> int nkerrptd; >>>> EXPORT_SYMBOL_GPL(nkerrptd); >>>> +int nkmmptd; >>>> +EXPORT_SYMBOL_GPL(nkmmptd); >>>> + >>>> +#define xnshadow_mmptd(t) ((t)->ptd[nkmmptd]) >>>> +#define xnshadow_mm(t) ((struct mm_struct *)xnshadow_mmptd(t)) >>> >>> xnshadow_mm() can now return a no longer existing mm. So no user of >>> xnshadow_mm should ever dereference that pointer. Thus we better chan= ge >>> all that user to treat the return value as a void pointer e.g. >>> >>>> =20 >>>> struct xnskin_slot { >>>> struct xnskin_props *props; >>>> @@ -1304,6 +1309,8 @@ int xnshadow_map(xnthread_t *thread, xncomplet= ion_t __user *u_completion, >>>> * friends. >>>> */ >>>> xnshadow_thrptd(current) =3D thread; >>>> + xnshadow_mmptd(current) =3D current->mm; >>>> + >>>> rthal_enable_notifier(current); >>>> =20 >>>> if (xnthread_base_priority(thread) =3D=3D 0 && >>>> @@ -2759,7 +2766,15 @@ static void detach_ppd(xnshadow_ppd_t * ppd) >>>> =20 >>>> static inline void do_cleanup_event(struct mm_struct *mm) >>>> { >>>> + struct task_struct *p =3D current; >>>> + struct mm_struct *old; >>>> + >>>> + old =3D xnshadow_mm(p); >>>> + xnshadow_mmptd(p) =3D mm; >>>> + >>>> ppd_remove_mm(mm, &detach_ppd); >>>> + >>>> + xnshadow_mmptd(p) =3D old; >>> >>> I don't have the full picture yet, but that feels racy: If the contex= t >>> over which we clean up that foreign mm is also using xnshadow_mmptd, >>> other threads in that process may dislike this temporary change. >>> >>>> } >>>> =20 >>>> RTHAL_DECLARE_CLEANUP_EVENT(cleanup_event); >>>> @@ -2925,7 +2940,7 @@ EXPORT_SYMBOL_GPL(xnshadow_unregister_interfac= e); >>>> xnshadow_ppd_t *xnshadow_ppd_get(unsigned muxid) >>>> { >>>> if (xnpod_userspace_p()) >>>> - return ppd_lookup(muxid, current->mm); >>>> + return ppd_lookup(muxid, xnshadow_mm(current) ?: current->mm); >>>> =20 >>>> return NULL; >>>> } >>>> @@ -2960,8 +2975,9 @@ int xnshadow_mount(void) >>>> sema_init(&completion_mutex, 1); >>>> nkthrptd =3D rthal_alloc_ptdkey(); >>>> nkerrptd =3D rthal_alloc_ptdkey(); >>>> + nkmmptd =3D rthal_alloc_ptdkey(); >>>> =20 >>>> - if (nkthrptd < 0 || nkerrptd < 0) { >>>> + if (nkthrptd < 0 || nkerrptd < 0 || nkmmptd < 0) { >>>> printk(KERN_ERR "Xenomai: cannot allocate PTD slots\n"); >>>> return -ENOMEM; >>>> } >>>> diff --git a/ksrc/skins/posix/mutex.c b/ksrc/skins/posix/mutex.c >>>> index 6ce75e5..cc86852 100644 >>>> --- a/ksrc/skins/posix/mutex.c >>>> +++ b/ksrc/skins/posix/mutex.c >>>> @@ -219,10 +219,6 @@ void pse51_mutex_destroy_internal(pse51_mutex_t= *mutex, >>>> xnlock_put_irqrestore(&nklock, s); >>>> =20 >>>> #ifdef CONFIG_XENO_FASTSYNCH >>>> - /* We call xnheap_free even if the mutex is not pshared; when >>>> - this function is called from pse51_mutexq_cleanup, the >>>> - sem_heap is destroyed, or not the one to which the fastlock >>>> - belongs, xnheap will simply return an error. */ >>> >>> I think this comment is not completely obsolete. It still applies /wr= t >>> shared/non-shared. >>> >>>> xnheap_free(&xnsys_ppd_get(mutex->attr.pshared)->sem_heap, >>>> mutex->synchbase.fastlock); >>>> #endif /* CONFIG_XENO_FASTSYNCH */ >>>> >>>> >>> >>> If we can resolve that potential race, this looks like a nice solutio= n. >> >> We still have to address that ordering issue I almost forgot: >> do_cleanup_event runs before do_task_exit_event when terminating the >> last task. The former destroys the sem heap, the latter fires the dele= te >> hook which then tries to free msendq.fastlock to an invalid heap. >> >> Should be fixable by setting sem_heap NULL in the ppd on destroy and >> skipping the fastlock release in __task_delete_hook if the heap pointe= r >> is found like that. >=20 > I do not think this can be a problem, as the do_cleanup_event will also= > destroy the threads. At least native tasks (but I bet that's true for al skins) aren't part of the XNSHADOW_CLIENT_DETACH cleanup procedure. > Anyway, I just pushed a branch "u_mode" on the > xenomai-gch git with all the work based on this mmptd, could you try an= d > pull it to see if you sill have this cleanup/task_exit issue? Will have a look. Thanks, Jan --------------enig77D9FFE90578748842004BF4 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.15 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/ iEYEARECAAYFAk4DIEUACgkQitSsb3rl5xSunQCfcE6vTo8sm42rx+25AFfuqX+v wOUAoOunhnPhDl93uIDqbvN3AH9+TZQI =YuE1 -----END PGP SIGNATURE----- --------------enig77D9FFE90578748842004BF4--