From: ebiederm@xmission.com (Eric W. Biederman)
To: Manfred Spraul <manfred@colorfullife.com>
Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>,
linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Davidlohr Bueso <dave@stgolabs.net>,
Greg KH <gregkh@linuxfoundation.org>,
Andrei Vagin <avagin@gmail.com>,
Pavel Tikhomirov <ptikhomirov@virtuozzo.com>,
Vasily Averin <vvs@virtuozzo.com>,
Alexander Mikhalitsyn <alexander@mihalicyn.com>,
stable@vger.kernel.org
Subject: Re: [RFC] shm: extend forced shm destroy to support objects from several IPC nses (simplified)
Date: Sun, 07 Nov 2021 13:51:40 -0600 [thread overview]
Message-ID: <87r1brpwvn.fsf@disp2133> (raw)
In-Reply-To: <8ba678da-207e-ea00-a56d-736a2184e69e@colorfullife.com> (Manfred Spraul's message of "Sat, 6 Nov 2021 15:42:07 +0100")
Manfred Spraul <manfred@colorfullife.com> writes:
> Hello together,
>
> On 11/5/21 22:34, Eric W. Biederman wrote:
>> +static inline void shm_clist_del(struct shmid_kernel *shp)
>> +{
>> + struct task_struct *creator;
>> +
>> + rcu_read_lock();
>> + creator = rcu_dereference(shp->shm_creator);
>> + if (creator) {
>> + task_lock(creator);
>> + list_del(&shp->shm_clist);
>> + task_unlock(creator);
>> + }
>> + rcu_read_unlock();
>> +}
>> +
>
> shm_clist_del() only synchronizes against exit_shm() when shm_creator
> is not NULL.
>
>
>> + list_del(&shp->shm_clist);
>> + rcu_assign_pointer(shp->shm_creator, NULL);
>> +
>
> We set shm_creator to NULL -> no more synchronization.
>
> Now IPC_RMID can run in parallel - regardless if we test for
> list_empty() or shm_creator.
>
>> +
>> + /* Guarantee shp lives after task_lock is dropped */
>> + ipc_getref(&shp->shm_perm);
>> +
>
> task_lock() doesn't help: As soon as shm_creator is set to NULL,
> IPC_RMID won't acquire task_lock() anymore.
>
> Thus shp can disappear before we arrive at this ipc_getref.
>
> [Yes, I think I have introduced this bug. ]
>
> Corrected version attached.
>
>
> I'll reboot and retest the patch, then I would send it to akpm as
> replacement for current patch in mmotm.
>
> --
>
> Manfred
>
> @@ -382,48 +425,94 @@ void shm_destroy_orphaned(struct ipc_namespace *ns)
> /* Locking assumes this will only be called with task == current */
> void exit_shm(struct task_struct *task)
> {
> - struct ipc_namespace *ns = task->nsproxy->ipc_ns;
> - struct shmid_kernel *shp, *n;
> + for (;;) {
> + struct shmid_kernel *shp;
> + struct ipc_namespace *ns;
>
> - if (list_empty(&task->sysvshm.shm_clist))
> - return;
> + task_lock(task);
> +
> + if (list_empty(&task->sysvshm.shm_clist)) {
> + task_unlock(task);
> + break;
> + }
> +
> + shp = list_first_entry(&task->sysvshm.shm_clist, struct shmid_kernel,
> + shm_clist);
>
> - /*
> - * If kernel.shm_rmid_forced is not set then only keep track of
> - * which shmids are orphaned, so that a later set of the sysctl
> - * can clean them up.
> - */
> - if (!ns->shm_rmid_forced) {
> - down_read(&shm_ids(ns).rwsem);
> - list_for_each_entry(shp, &task->sysvshm.shm_clist, shm_clist)
> - shp->shm_creator = NULL;
> /*
> - * Only under read lock but we are only called on current
> - * so no entry on the list will be shared.
> + * 1) get a reference to shp.
> + * This must be done first: Right now, task_lock() prevents
> + * any concurrent IPC_RMID calls. After the list_del_init(),
> + * IPC_RMID will not acquire task_lock(->shm_creator)
> + * anymore.
> */
> - list_del(&task->sysvshm.shm_clist);
> - up_read(&shm_ids(ns).rwsem);
> - return;
> - }
> + WARN_ON(!ipc_rcu_getref(&shp->shm_perm));
>
> - /*
> - * Destroy all already created segments, that were not yet mapped,
> - * and mark any mapped as orphan to cover the sysctl toggling.
> - * Destroy is skipped if shm_may_destroy() returns false.
> - */
> - down_write(&shm_ids(ns).rwsem);
> - list_for_each_entry_safe(shp, n, &task->sysvshm.shm_clist, shm_clist) {
> - shp->shm_creator = NULL;
> + /* 2) unlink */
> + list_del_init(&shp->shm_clist);
> +
> + /*
> + * 3) Get pointer to the ipc namespace. It is worth to say
> + * that this pointer is guaranteed to be valid because
> + * shp lifetime is always shorter than namespace lifetime
> + * in which shp lives.
> + * We taken task_lock it means that shp won't be freed.
> + */
> + ns = shp->ns;
>
> - if (shm_may_destroy(ns, shp)) {
> - shm_lock_by_ptr(shp);
> - shm_destroy(ns, shp);
> + /*
> + * 4) If kernel.shm_rmid_forced is not set then only keep track of
> + * which shmids are orphaned, so that a later set of the sysctl
> + * can clean them up.
> + */
> + if (!ns->shm_rmid_forced) {
> + ipc_rcu_putref(&shp->shm_perm, shm_rcu_free);
> + task_unlock(task);
> + continue;
> }
> - }
>
> - /* Remove the list head from any segments still attached. */
> - list_del(&task->sysvshm.shm_clist);
> - up_write(&shm_ids(ns).rwsem);
> + /*
> + * 5) get a reference to the namespace.
> + * The refcount could be already 0. If it is 0, then
> + * the shm objects will be free by free_ipc_work().
> + */
> + ns = get_ipc_ns_not_zero(ns);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Isn't this increment also too late? Doesn't this need to move up
by ipc_rcu_getref while shp is still on the list?
Assuming the code is running in parallel with shm_exit_ns after removal
from shm_clist shm_destroy can run to completion and shm_exit_ns can
run to completion and the ipc namespace can be freed.
Eric
next prev parent reply other threads:[~2021-11-07 19:52 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-27 22:43 [PATCH 0/2] shm: shm_rmid_forced feature fixes Alexander Mikhalitsyn
2021-10-27 22:43 ` [PATCH 1/2] ipc: WARN if trying to remove ipc object which is absent Alexander Mikhalitsyn
2021-10-27 22:43 ` [PATCH 2/2] shm: extend forced shm destroy to support objects from several IPC nses Alexander Mikhalitsyn
2021-10-30 4:26 ` Eric W. Biederman
2021-10-30 13:11 ` Manfred Spraul
2021-11-05 17:46 ` Eric W. Biederman
2021-11-05 19:03 ` Manfred Spraul
2021-11-05 21:34 ` [RFC] shm: extend forced shm destroy to support objects from several IPC nses (simplified) Eric W. Biederman
2021-11-06 7:50 ` Manfred Spraul
2021-11-06 14:42 ` Manfred Spraul
2021-11-07 19:51 ` Eric W. Biederman [this message]
2021-11-08 18:34 ` Manfred Spraul
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r1brpwvn.fsf@disp2133 \
--to=ebiederm@xmission.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.mikhalitsyn@virtuozzo.com \
--cc=alexander@mihalicyn.com \
--cc=avagin@gmail.com \
--cc=dave@stgolabs.net \
--cc=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=manfred@colorfullife.com \
--cc=ptikhomirov@virtuozzo.com \
--cc=stable@vger.kernel.org \
--cc=vvs@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox