linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: jack@suse.cz, amir73il@gmail.com, willy@infradead.org,
	linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	gorcunov@virtuozzo.com
Subject: Re: [PATCH v2] inotify: Extend ioctl to allow to request id of new watch descriptor
Date: Fri, 9 Feb 2018 12:56:56 -0800	[thread overview]
Message-ID: <20180209125656.e440e0518540d6b76ae42bc0@linux-foundation.org> (raw)
In-Reply-To: <ca006760-de72-37b3-f6fd-311c86f29b62@virtuozzo.com>

On Fri, 9 Feb 2018 18:04:54 +0300 Kirill Tkhai <ktkhai@virtuozzo.com> wrote:

> Watch descriptor is id of the watch created by inotify_add_watch().
> It is allocated in inotify_add_to_idr(), and takes the numbers
> starting from 1. Every new inotify watch obtains next available
> number (usually, old + 1), as served by idr_alloc_cyclic().
> 
> CRIU (Checkpoint/Restore In Userspace) project supports inotify
> files, and restores watched descriptors with the same numbers,
> they had before dump. Since there was no kernel support, we
> had to use cycle to add a watch with specific descriptor id:
> 
> 	while (1) {
> 		int wd;
> 
> 		wd = inotify_add_watch(inotify_fd, path, mask);
> 		if (wd < 0) {
> 			break;
> 		} else if (wd == desired_wd_id) {
> 			ret = 0;
> 			break;
> 		}
> 
> 		inotify_rm_watch(inotify_fd, wd);
> 	}
> 
> (You may find the actual code at the below link:
>  https://github.com/checkpoint-restore/criu/blob/v3.7/criu/fsnotify.c#L577)
> 
> The cycle is suboptiomal and very expensive, but since there is no better
> kernel support, it was the only way to restore that. Happily, we had met
> mostly descriptors with small id, and this approach had worked somehow.
> 
> But recent time containers with inotify with big watch descriptors
> begun to come, and this way stopped to work at all. When descriptor id
> is something about 0x34d71d6, the restoring process spins in busy loop
> for a long time, and the restore hungs and delay of migration from node
> to node could easily be watched.
> 
> This patch aims to solve this problem. It introduces new ioctl
> INOTIFY_IOC_SETNEXTWD, which allows to request the number of next created
> watch descriptor from userspace. It simply calls idr_set_cursor() primitive
> to populate idr::idr_next, so that next idr_alloc_cyclic() allocation
> will return this id, if it is not occupied. This is the way which is
> used to restore some other resources from userspace. For example,
> /proc/sys/kernel/ns_last_pid works the same for task pids.
> 
> The new code is under CONFIG_CHECKPOINT_RESTORE #define, so small system
> may exclude it.
> 

Reviewed-by: Andrew Morton <akpm@linux-foundation.org>

With a little cleanup:

--- a/fs/notify/inotify/inotify_user.c~inotify-extend-ioctl-to-allow-to-request-id-of-new-watch-descriptor-fix
+++ a/fs/notify/inotify/inotify_user.c
@@ -285,7 +285,6 @@ static int inotify_release(struct inode
 static long inotify_ioctl(struct file *file, unsigned int cmd,
 			  unsigned long arg)
 {
-	struct inotify_group_private_data *data __maybe_unused;
 	struct fsnotify_group *group;
 	struct fsnotify_event *fsn_event;
 	void __user *p;
@@ -294,7 +293,6 @@ static long inotify_ioctl(struct file *f
 
 	group = file->private_data;
 	p = (void __user *) arg;
-	data = &group->inotify_data;
 
 	pr_debug("%s: group=%p cmd=%u\n", __func__, group, cmd);
 
@@ -313,6 +311,9 @@ static long inotify_ioctl(struct file *f
 	case INOTIFY_IOC_SETNEXTWD:
 		ret = -EINVAL;
 		if (arg >= 1 && arg <= INT_MAX) {
+			struct inotify_group_private_data *data;
+
+			data = &group->inotify_data;
 			spin_lock(&data->idr_lock);
 			idr_set_cursor(&data->idr, (unsigned int)arg);
 			spin_unlock(&data->idr_lock);
_

  parent reply	other threads:[~2018-02-09 20:56 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <151810242614.30935.12876744458891870220.stgit@localhost.localdomain>
     [not found] ` <151810242614.30935.12876744458891870220.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2018-02-08 16:14   ` [PATCH] inotify: Extend ioctl to allow to request id of new watch descriptor Jan Kara
2018-02-08 17:58     ` Cyrill Gorcunov
2018-02-09 15:04   ` [PATCH v2] " Kirill Tkhai
2018-02-09 15:14     ` Matthew Wilcox
2018-02-09 20:56     ` Andrew Morton [this message]
2018-02-09 22:45       ` Kirill Tkhai
2018-02-11 11:30         ` Stef Bon
     [not found]           ` <CANXojcxKH1zFHOPsJh7zbjshUkFGagah-vN6EvMU7Q-9kLFmpg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-02-12  8:42             ` Kirill Tkhai
     [not found]         ` <bb9bafab-9a50-d19e-7293-65e74aca4720-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2018-02-14 10:18           ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180209125656.e440e0518540d6b76ae42bc0@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=amir73il@gmail.com \
    --cc=gorcunov@virtuozzo.com \
    --cc=jack@suse.cz \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).