From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751649AbdJBNpf (ORCPT ); Mon, 2 Oct 2017 09:45:35 -0400 Received: from shelob.surriel.com ([96.67.55.147]:51089 "EHLO shelob.surriel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751277AbdJBNpd (ORCPT ); Mon, 2 Oct 2017 09:45:33 -0400 Message-ID: <1506951929.21121.121.camel@surriel.com> Subject: Re: [PATCH v2 2/2] pid: Remove pidhash From: Rik van Riel To: Oleg Nesterov , Gargi Sharma Cc: linux-kernel@vger.kernel.org, julia.lawall@lip6.fr, akpm@linux-foundation.org, mingo@kernel.org, pasha.tatashin@oracle.com, ktkhai@virtuozzo.com Date: Mon, 02 Oct 2017 09:45:29 -0400 In-Reply-To: <1506951313.21121.119.camel@surriel.com> References: <20170927154543.GA3788@redhat.com> <1506951313.21121.119.camel@surriel.com> Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-mrMSqEbsz6wsdcWuKRNE" X-Mailer: Evolution 3.22.6 (3.22.6-2.fc25) Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-mrMSqEbsz6wsdcWuKRNE Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, 2017-10-02 at 09:35 -0400, Rik van Riel wrote: > On Wed, 2017-09-27 at 17:45 +0200, Oleg Nesterov wrote: > > On 09/27, Gargi Sharma wrote: > > >=20 > > > -#define find_next_offset(map, off) =09 > > > =09 > > > \ > > > - find_next_zero_bit((map)->page, BITS_PER_PAGE, > > > off) > > > - > >=20 > > this should go into the previous patch, but this is minor... > >=20 > > > @@ -208,12 +200,10 @@ struct pid *alloc_pid(struct pid_namespace > > > *ns) > > > =C2=A0 > > > =C2=A0 upid =3D pid->numbers + ns->level; > > > =C2=A0 spin_lock_irq(&pidmap_lock); > > > - if (!(ns->nr_hashed & PIDNS_HASH_ADDING)) > > > + if (!(ns->pid_allocated & PIDNS_ADDING)) > > > =C2=A0 goto out_unlock; > > > =C2=A0 for ( ; upid >=3D pid->numbers; --upid) { > > > - hlist_add_head_rcu(&upid->pid_chain, > > > - &pid_hash[pid_hashfn(upid->nr, > > > upid->ns)]); > > > - upid->ns->nr_hashed++; > > > + upid->ns->pid_allocated++; > >=20 > > No, this is wrong. > >=20 > > It is too late to check PIDNS_HASH_ADDING/PIDNS_ADDING and > > increment > > pid_allocated, > > once we call idr_alloc_cyclic() this pid is already "hashed" in > > that > > it can be found > > by find_pid_ns() with this patch applied. > >=20 > > And of course, it is too late to do atomic_set(&pid->count, 1) and > > initialize > > pid->tasks[type] lists by the same reason. >=20 > Hi Oleg, >=20 > Gargi and I are looking at that code, and trying to figure out > exactly what needs to be done to make all of this correct. >=20 > We are thinking something along these lines: .... and we looked at the code some more. Some of the way the code is laid out currently looks like an artifact of history. Lets try this again: 1) pid =3D kmem_cache_alloc(...) 2) get_pid_ns(ns); atomic_set(&pid->count, 1); for (type =3D 0; ...) INIT_HLIST(...) > 3) First, check if this is a new namespace (PIDNS_ADDING), and > =C2=A0=C2=A0=C2=A0do the call to pid_ns_prepare_proc, before we even call= idr_alloc. > =C2=A0=C2=A0=C2=A0Maybe something like: >=20 > =C2=A0=C2=A0=C2=A0if (unlikely(ns->nr_allocated =3D=3D PIDNS_ADDING)) { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (pid_ns_prepare_proc(ns)) { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0disable_pid_allocat= ions(ns); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0goto out_free_ns; > =C2=A0=C2=A0=C2=A0} >=20 > 4) With pid_ns_prepare_proc out of the way, we can put all the code > =C2=A0=C2=A0=C2=A0from below where the call to pid_ns_prepare_proc is now= (except > =C2=A0=C2=A0=C2=A0error handing) into the main loop of pid allocation, so= we can > =C2=A0=C2=A0=C2=A0do all that stuff under the pidmap_lock: >=20 > =C2=A0=C2=A0=C2=A0for (i =3D ns->level; i >=3D 0; i--) { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0... > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0idr_alloc_cyclic(...) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0ns->nr_allocated++; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0... > =C2=A0 } >=20 > Would that resolve your objection, or are we barking up the wrong > tree? Now the per-pid struct stuff is done once, before the loop, and the loop deals only with allocated numbers inside each parent namespace. The error path would unwind stuff done earlier. --=20 All Rights Reversed. --=-mrMSqEbsz6wsdcWuKRNE Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJZ0kL5AAoJEM553pKExN6DsOUH/2oE69M0jUI43JIK7GNreZaD z4+04PiBJyr9pwLrhhFm9vi9j+0VqXn9hcrn57VmqpSRSs5vKVSXVLorLAlyScZW Ml6b2xs/7Tl6DH0aBW01CAbyj2IN8SPubRXhlty2jr/+p+4qmkBluVBpPCsxusWU r4gdbgIWiJl1XrK8LE/zNx3bQGdlEyldei4mXSVVn/fDyAPfILbHoiMs9dUs4SJf Dq831gj24+RO+pczUgaKT9l+x5+55oGBguB+PnpDailhH83akgzK4Jtmno2K5N9L RGGkLumyhvzMt6+UI0LhaZiIKsaQsDlfvOtPZM/VzEt8BhSGKmNWhhisF+JLZ+Y= =zB13 -----END PGP SIGNATURE----- --=-mrMSqEbsz6wsdcWuKRNE--