linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pranay Srivastava <pranjas@gmail.com>
To: Markus Pargmann <mpa@pengutronix.de>
Cc: nbd-general@lists.sourceforge.net, linux-kernel@vger.kernel.org,
	Wouter Verhelst <w@uter.be>
Subject: Re: [PATCH v2 4/5]nbd: make nbd device wait for its users.
Date: Wed, 15 Jun 2016 14:47:23 +0530	[thread overview]
Message-ID: <CA+aCy1EsbLqR6rOTGwNjaAvL3FPjoH=TV3C_WMU5H1DWMC-sqA@mail.gmail.com> (raw)
In-Reply-To: <11733279.HlKjcK63G4@adelgunde>

Hey Markus,

On Wed, Jun 15, 2016 at 12:00 PM, Markus Pargmann <mpa@pengutronix.de> wrote:
> Hi Pranay,
>
> On Tuesday 14 June 2016 15:03:40 Pranay Srivastava wrote:
>> Hi Markus,
>>
>> On Tue, Jun 14, 2016 at 2:29 PM, Markus Pargmann <mpa@pengutronix.de> wrote:
>> >
>> > On Thursday 02 June 2016 13:25:00 Pranay Kr. Srivastava wrote:
>> > > When a timeout occurs or a recv fails, then
>> > > instead of abruplty killing nbd block device
>> > > wait for it's users to finish.
>> > >
>> > > This is more required when filesystem(s) like
>> > > ext2 or ext3 don't expect their buffer heads to
>> > > disappear while the filesystem is mounted.
>> > >
>> > > Each open of a nbd device is refcounted, while
>> > > the userland program [nbd-client] doing the
>> > > NBD_DO_IT ioctl would now wait for any other users
>> > > of this device before invalidating the nbd device.
>> > >
>> > > Signed-off-by: Pranay Kr. Srivastava <pranjas@gmail.com>
>> > > ---
>> > >  drivers/block/nbd.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > >  1 file changed, 58 insertions(+)
>> > >
>> > > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
>> > > index d1d898d..4da40dc 100644
>> > > --- a/drivers/block/nbd.c
>> > > +++ b/drivers/block/nbd.c
>> > > @@ -70,10 +70,13 @@ struct nbd_device {
>> > >  #if IS_ENABLED(CONFIG_DEBUG_FS)
>> > >       struct dentry *dbg_dir;
>> > >  #endif
>> > > +     atomic_t inuse;
>> > >       /*
>> > >        *This is specifically for calling sock_shutdown, for now.
>> > >        */
>> > >       struct work_struct ws_shutdown;
>> > > +     struct kref users;
>> > > +     struct completion user_completion;
>> > >  };
>> > >
>> > >  #if IS_ENABLED(CONFIG_DEBUG_FS)
>> > > @@ -104,6 +107,7 @@ static DEFINE_SPINLOCK(nbd_lock);
>> > >   * Shutdown function for nbd_dev work struct.
>> > >   */
>> > >  static void nbd_ws_func_shutdown(struct work_struct *);
>> > > +static void nbd_kref_release(struct kref *);
>> > >
>> > >  static inline struct device *nbd_to_dev(struct nbd_device *nbd)
>> > >  {
>> > > @@ -682,6 +686,8 @@ static void nbd_reset(struct nbd_device *nbd)
>> > >       nbd->flags = 0;
>> > >       nbd->xmit_timeout = 0;
>> > >       INIT_WORK(&nbd->ws_shutdown, nbd_ws_func_shutdown);
>> > > +     init_completion(&nbd->user_completion);
>> > > +     kref_init(&nbd->users);
>> > >       queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, nbd->disk->queue);
>> > >       del_timer_sync(&nbd->timeout_timer);
>> > >  }
>> > > @@ -815,6 +821,14 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
>> > >               kthread_stop(thread);
>> > >
>> > >               sock_shutdown(nbd);
>> > > +             /*
>> > > +              * kref_init initializes with ref count as 1,
>> > > +              * nbd_client, or the user-land program executing
>> > > +              * this ioctl will make the refcount to 2[at least]
>> > > +              * so subtracting 2 from refcount.
>> > > +              */
>> > > +             kref_sub(&nbd->users, 2, nbd_kref_release);
>> >
>> > Why don't you use a kref_put?
>>
>> Ok, so I'll try to explain as I've understood the problem.
>>
>> When the module is loaded the kref is initialized to 1.
>>
>> Suppose now, someone has started nbd-client [nbdC-1] , then this
>> nbd-client will increase the ref count to 2. So far so good...
>>
>> Now let's say this device is being shutdown via nbd-client[nbdC-2].
>>
>> nbdC-1 will subtract the refcount by two, it has to do in NBD_DO_IT
>> since device file will not
>> be closed until after ioctl is over, and it'll wait_for_completion.
>>
>> nbdC-2 now closes it's use of device file, this makes the refcount as
>> zero and completion
>> is triggered with nbdC-1 completed.
>>
>> Now we don't want to trigger kref_put when nbdC-1 closes the device
>> file so kref_put needs
>> to be conditional in this regard so for that in_use is used.
>>
>>
>> >
>> > > +             wait_for_completion(&nbd->user_completion);
>> > >               mutex_lock(&nbd->tx_lock);
>> > >               nbd_clear_que(nbd);
>> > >               kill_bdev(bdev);
>> > > @@ -865,13 +879,56 @@ static int nbd_ioctl(struct block_device *bdev, fmode_t mode,
>> > >
>> > >       return error;
>> > >  }
>> > > +static void nbd_kref_release(struct kref *kref_users)
>> > > +{
>> > > +     struct nbd_device *nbd = container_of(kref_users, struct nbd_device,
>> > > +                     users);
>> >
>> > Not indented to opening bracket.
>> >
>> > > +     pr_debug("Releasing kref [%s]\n", __func__);
>> > > +     atomic_set(&nbd->inuse, 0);
>> > > +     complete(&nbd->user_completion);
>> > > +
>> > > +}
>> > > +
>> > > +static int nbd_open(struct block_device *bdev, fmode_t mode)
>> > > +{
>> > > +     struct nbd_device *nbd_dev = bdev->bd_disk->private_data;
>> > > +
>> > > +     if (kref_get_unless_zero(&nbd_dev->users))
>> > > +             atomic_set(&nbd_dev->inuse, 1);
>> > > +
>> > > +     pr_debug("Opening nbd_dev %s. Active users = %u\n",
>> > > +                     bdev->bd_disk->disk_name,
>> > > +                     atomic_read(&nbd_dev->users.refcount) - 1);
>> >
>> > Indent to opening bracket.
>> >
>> > > +     return 0;
>> > > +}
>> > > +
>> > > +static void nbd_release(struct gendisk *disk, fmode_t mode)
>> > > +{
>> > > +     struct nbd_device *nbd_dev = disk->private_data;
>> > > +     /*
>> > > +     *kref_init initializes ref count to 1, so we
>> > > +     *we check for refcount to be 2 for a final put.
>> > > +     *
>> > > +     *kref needs to be re-initialized just here as the
>> > > +     *other process holding it must see the ref count as 2.
>> > > +     */
>> > > +     if (atomic_read(&nbd_dev->inuse))
>> > > +             kref_put(&nbd_dev->users,  nbd_kref_release);
>> >
>>
>> > What is this inuse atomic for? Everyone that releases the nbd device
>> > will need to execute a kref_put().
>>
>> To do away with inuse, perhaps we can do
>>
>> kref_get just before leaving the NBD_DO_IT? so that when device file
>> is closed everyone
>> would do a kref_put? However there's a small race window while the
>> kref is being initialized,
>> and another process [not just nbd-client] is trying to open the device.
>>
>> Do you think it's better to do this by introducing a spin_lock instead
>> of atomic?
>>
>> Let me know if my understanding is correct.
>
> Thanks for the explanations. I think my understanding was off by one ;).
> I didn't realize that the DO_IT thread from the userspace has the block
> device open as well.
>
> I thought a bit about this, does it make sense to delay the essential
> cleanup steps until really all open file handles were closed? So that
> even if the DO_IT thread exits, the block device is still there. Only if
> the file is closed everything is cleaned up. Maybe this makes the code
> simpler and we can directly use krefs without any strange constructs.
> What do you think?
>

I chose open/close as that is the common interface to all processes that need
to use nbd device and not just nbd-client.

Let's take example of a mount, so some user has just done a mount of this device
and right now it's not doing anything. So someone issues an NBD_DISCONNECT
and wham... The idea is to propagate errors to user space correctly.

So the solution I've proposed says, if there's someone
apart_from_nbd-client (which
is currently waiting for NBD_DO_IT to complete) is
using this device then nbd-client should honor that and shouldn't
allow this device
to be reused until after all such processes have left this device.


> This would also allow the client to setup a new socket as long as it
> does not close the nbd file handle.

I think that is still possible right, which is why there's a kref_sub
of 2, so the wait
is only for the "other processes" using this device and _not_ this
nbd-client whose
NBD_DO_IT just got over.

Now I'm just concerned over the processes which are trying to use this
nbd device
but the re-initialzation code at the end of NBD_DO_IT hasn't been done. So it's
possible some device may skip getting a kref, due to kref_get_unless_zero.

Actually I wanted to put this kref under bd_mutex to avoid such races and it'll
always be a call to kref_get in open and kref_put in close. Still
kref_sub(2) would
be required :-)

>
> Could this behavior be potentially problematic for any client
> implementation? Does it solve our other issue with setting up a new
> sockets for an existing nbd blockdevice?

I don't think that would be a problem. sock_shutdown is before the wait
right? So in error condition the worker thread would close it and set to null
while for a normal case, sock_shutdown in NBD_DO_IT would set that.
So it should be OK.

>
> Cc Wouter
>
> Best Regards,
>
> Markus
>
>>
>>
>> >
>> > Best Regards,
>> >
>> > Markus
>> >
>> > > +
>> > > +     pr_debug("Closing nbd_dev %s. Active users = %u\n",
>> > > +                     disk->disk_name,
>> > > +                     atomic_read(&nbd_dev->users.refcount) - 1);
>> > > +}
>> > >
>> > >  static const struct block_device_operations nbd_fops = {
>> > >       .owner =        THIS_MODULE,
>> > >       .ioctl =        nbd_ioctl,
>> > >       .compat_ioctl = nbd_ioctl,
>> > > +     .open =         nbd_open,
>> > > +     .release =      nbd_release
>> > >  };
>> > >
>> > > +
>> > >  static void nbd_ws_func_shutdown(struct work_struct *ws_nbd)
>> > >  {
>> > >       struct nbd_device *nbd_dev = container_of(ws_nbd, struct nbd_device,
>> > > @@ -1107,6 +1164,7 @@ static int __init nbd_init(void)
>> > >               disk->fops = &nbd_fops;
>> > >               disk->private_data = &nbd_dev[i];
>> > >               sprintf(disk->disk_name, "nbd%d", i);
>> > > +             atomic_set(&nbd_dev[i].inuse, 0);
>> > >               nbd_reset(&nbd_dev[i]);
>> > >               add_disk(disk);
>> > >       }
>> > >
>> >
>> > --
>> > Pengutronix e.K.                           |                             |
>> > Industrial Linux Solutions                 | http://www.pengutronix.de/  |
>> > Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
>> > Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
>>
>>
>>
>>
>>
>
> --
> Pengutronix e.K.                           |                             |
> Industrial Linux Solutions                 | http://www.pengutronix.de/  |
> Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
> Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |



-- 
        ---P.K.S

  parent reply	other threads:[~2016-06-15  9:17 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-24 11:26 [PATCH 0/4]nbd: fixes for nbd Pranay Kr. Srivastava
2016-05-24 11:26 ` [PATCH 1/4] fix might_sleep warning on socket shutdown Pranay Kr. Srivastava
2016-05-30 12:06   ` Markus Pargmann
2016-05-24 11:26 ` [PATCH 2/4] fix various coding standard warnings Pranay Kr. Srivastava
2016-05-24 11:26 ` [PATCH 3/4] make nbd device wait for its users Pranay Kr. Srivastava
2016-05-30 10:44   ` Markus Pargmann
2016-05-24 11:26 ` [PATCH 4/4] use device_attr macros for sysfs attribute Pranay Kr. Srivastava
2016-05-30  3:35 ` [PATCH 0/4]nbd: fixes for nbd Pranay Srivastava
2016-05-30 12:27 ` Markus Pargmann
2016-06-02 10:24   ` [PATCH v2 0/5] nbd: " Pranay Kr. Srivastava
2016-06-02 10:24     ` [PATCH v2 1/5] nbd: fix might_sleep warning on socket shutdown Pranay Kr. Srivastava
2016-06-09 10:03       ` Pranay Srivastava
2016-06-14  5:13         ` Pranay Srivastava
2016-06-14  8:52       ` Markus Pargmann
2016-06-14  9:50         ` Pranay Srivastava
2016-06-24 10:09         ` [PATCH v3 0/3] nbd: resolve bugs and limitations Pranay Kr. Srivastava
2016-06-24 10:09           ` [PATCH v3 1/3]nbd: fix might_sleep warning on socket shutdown Pranay Kr. Srivastava
2016-06-28  5:42             ` Pranay Srivastava
2016-06-29  7:18             ` Markus Pargmann
2016-06-24 10:09           ` [PATCH v3 2/3]nbd: cleanup nbd_set_socket Pranay Kr. Srivastava
2016-06-24 10:09           ` [PATCH 3/3]nbd: make nbd device wait for its users Pranay Kr. Srivastava
2016-06-24 13:42             ` [Nbd] " Eric Blake
2016-06-25 17:56               ` Pranay Srivastava
2016-06-25 18:01             ` Pranay Srivastava
2016-06-29  7:06             ` Markus Pargmann
2016-06-29  7:15               ` Pranay Srivastava
2016-06-02 10:24     ` [PATCH v2 2/5]nbd: cleanup nbd_set_socket Pranay Kr. Srivastava
2016-06-02 10:24     ` [PATCH v2 3/5]nbd: fix various coding standard warnings Pranay Kr. Srivastava
2016-06-02 10:25     ` [PATCH v2 4/5]nbd: make nbd device wait for its users Pranay Kr. Srivastava
2016-06-14  8:59       ` Markus Pargmann
2016-06-14  9:33         ` Pranay Srivastava
2016-06-15  6:30           ` Markus Pargmann
2016-06-15  7:00             ` [Nbd] " Wouter Verhelst
2016-06-15  9:18               ` Pranay Srivastava
2016-06-15  9:17             ` Pranay Srivastava [this message]
2016-06-24  9:29               ` [PATCH 1/2] nbd: " Markus Pargmann
2016-06-24  9:29                 ` [PATCH 2/2] nbd: Disallow ioctls on disconnected block device Markus Pargmann
2016-07-16  7:42                   ` Pranay Srivastava
2016-07-16  9:32                     ` [Nbd] " Alex Bligh
2016-07-16 10:08                       ` Pranay Srivastava
2016-07-16 11:26                         ` Wouter Verhelst
2016-07-16 13:31                           ` Pranay Srivastava
2016-06-24  9:39                 ` [PATCH 1/2] nbd: make nbd device wait for its users Pranay Srivastava
2016-06-24 13:40                 ` [Nbd] " Eric Blake
2016-06-25 17:52                 ` Pranay Srivastava
2016-06-29  6:57                   ` Markus Pargmann
2016-06-02 10:25     ` [PATCH v2 5/5]nbd: use device_attr macros for sysfs attribute Pranay Kr. Srivastava
     [not found]     ` <CA+aCy1E0S4ofa04xcO9qxQmuipaF5wdnrv3ubSvETn-rBYYisA@mail.gmail.com>
2016-06-06 11:07       ` [PATCH 0/4]nbd: fixes for nbd Pranay Srivastava

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+aCy1EsbLqR6rOTGwNjaAvL3FPjoH=TV3C_WMU5H1DWMC-sqA@mail.gmail.com' \
    --to=pranjas@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpa@pengutronix.de \
    --cc=nbd-general@lists.sourceforge.net \
    --cc=w@uter.be \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).