All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
To: Oded Gabbay <ogabbay@kernel.org>
Cc: Tomer Tayar <ttayar@habana.ai>, dri-devel@lists.freedesktop.org
Subject: Re: [PATCH 08/27] habanalabs: add info when FD released while device still in use
Date: Thu, 16 Feb 2023 16:04:47 +0100	[thread overview]
Message-ID: <20230216150447.GJ2849548@linux.intel.com> (raw)
In-Reply-To: <CAFCwf13c6T-S2MgwmWJkrQTwdXYDGMK+GG8ZVUjPipsXNrW_ZQ@mail.gmail.com>

On Thu, Feb 16, 2023 at 04:21:48PM +0200, Oded Gabbay wrote:
> On Thu, Feb 16, 2023 at 2:25 PM Stanislaw Gruszka
> <stanislaw.gruszka@linux.intel.com> wrote:
> >
> > On Sun, Feb 12, 2023 at 10:44:35PM +0200, Oded Gabbay wrote:
> > > From: Tomer Tayar <ttayar@habana.ai>
> > >
> > > When user closes the device file descriptor, it is checked whether the
> > > device is still in use, and a message is printed if it is.
> >
> > I guess this is only for debugging your user-space component?
> > Because kernel driver should not make any assumption about
> > user-space behavior. Closing whenever user wants should be
> > no problem.
> First of all, indeed the user can close the device whatever it wants.
> We don't limit him, but we do need to track the device state, because
> we can't allow a new user to acquire the device until it is idle (due
> to h/w limitations).
> Therefore, this print is not so much for debug, as it is for letting
> the user know the device wasn't idle after he closed it, and
> therefore, we are going to reset it to make it idle.
> So, it is a notification that is important imo.

This sounds like something that should be handed in open() with -EAGAIN
error with eventual message in dmesg. But you know best what info
is needed by user-space :-)

> > > +static void print_device_in_use_info(struct hl_device *hdev, const char *message)
> > > +{
> > > +     u32 active_cs_num, dmabuf_export_cnt;
> > > +     char buf[64], *buf_ptr = buf;
> > > +     size_t buf_size = sizeof(buf);
> > > +     bool unknown_reason = true;
> > > +
> > > +     active_cs_num = hl_get_active_cs_num(hdev);
> > > +     if (active_cs_num) {
> > > +             unknown_reason = false;
> > > +             compose_device_in_use_info(&buf_ptr, &buf_size, " [%u active CS]", active_cs_num);
> > > +     }
> > > +
> > > +     dmabuf_export_cnt = atomic_read(&hdev->dmabuf_export_cnt);
> > > +     if (dmabuf_export_cnt) {
> > > +             unknown_reason = false;
> > > +             compose_device_in_use_info(&buf_ptr, &buf_size, " [%u exported dma-buf]",
> > > +                                             dmabuf_export_cnt);
> > > +     }
> > > +
> > > +     if (unknown_reason)
> > > +             compose_device_in_use_info(&buf_ptr, &buf_size, " [unknown reason]");
> > > +
> > > +     dev_notice(hdev->dev, "%s%s\n", message, buf);
> >
> > why not print counters directly, i.e. "active cs count %u, dmabuf export count %u" ?
> Because we wanted to print the specific reason, or unknown reason, and
> not print all the possible counters in one line, because most of the
> time most of the counters will be 0.
> We plan to add more reasons so this helper simplifies the code.

Ok, just place replace compose_device_in_use_info() with snprintf().
I don't think you need custom implementation of snprintf().

> > > +             print_device_in_use_info(hdev, "User process closed FD but device still in use");
> > >               hl_device_reset(hdev, HL_DRV_RESET_HARD);
> >
> > You really need reset here ?
> Yes, our h/w requires that we reset the device after the user closed
> it. If the device is not idle after the user closed it, we hard reset
> it.
> If it is idle, we do a more graceful reset.

Hmm, ok.

Regards
Stanislaw


  reply	other threads:[~2023-02-16 15:05 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-12 20:44 [PATCH 01/27] habanalabs/gaudi2: increase user interrupt grace time Oded Gabbay
2023-02-12 20:44 ` [PATCH 02/27] habanalabs/gaudi: capture RAZWI info only if HW indication detected Oded Gabbay
2023-02-12 20:44 ` [PATCH 03/27] habanalabs: split cdev creation to separate function Oded Gabbay
2023-02-16 10:40   ` Tomer Tayar
2023-02-12 20:44 ` [PATCH 04/27] habanalabs: save class in hdev Oded Gabbay
2023-02-16 10:40   ` Tomer Tayar
2023-02-12 20:44 ` [PATCH 05/27] habanalabs: refactor debugfs init Oded Gabbay
2023-02-16 10:40   ` Tomer Tayar
2023-02-12 20:44 ` [PATCH 06/27] habanalabs: use memhash_node_export_put() in hl_release_dmabuf() Oded Gabbay
2023-02-16 11:48   ` Stanislaw Gruszka
2023-02-16 14:26     ` Tomer Tayar
2023-02-16 14:40       ` Stanislaw Gruszka
2023-02-12 20:44 ` [PATCH 07/27] habanalabs/gaudi2: fix address decode RAZWI handling Oded Gabbay
2023-02-12 20:44 ` [PATCH 08/27] habanalabs: add info when FD released while device still in use Oded Gabbay
2023-02-16 12:25   ` Stanislaw Gruszka
2023-02-16 14:21     ` Oded Gabbay
2023-02-16 15:04       ` Stanislaw Gruszka [this message]
2023-02-17 11:34         ` Tomer Tayar
2023-02-20 15:54           ` Stanislaw Gruszka
2023-02-20 16:16             ` Tomer Tayar
2023-02-12 20:44 ` [PATCH 09/27] habanalabs: enforce release order of compute device and dma-buf Oded Gabbay
2023-02-12 20:44 ` [PATCH 10/27] habanalabs: add critical-event bit in notifier Oded Gabbay
2023-02-12 20:44 ` [PATCH 11/27] habanalabs/gaudi2: expose engine core int reg address Oded Gabbay
2023-02-12 20:44 ` [PATCH 12/27] habanalabs/gaudi2: unsecure CFG_TPC_ID register Oded Gabbay
2023-02-12 20:44 ` [PATCH 13/27] habanalabs: minimize error prints when mem map fails Oded Gabbay
2023-02-12 20:44 ` [PATCH 14/27] habanalabs: disable PCI when escalating compute to hard-reset Oded Gabbay
2023-02-12 20:44 ` [PATCH 15/27] habanalabs: enable graceful reset mechanism for compute-reset Oded Gabbay
2023-02-12 20:44 ` [PATCH 16/27] habanalabs/gaudi2: get reset type indication from irq_map Oded Gabbay
2023-02-12 20:44 ` [PATCH 17/27] habanalabs/gaudi2: modify events reset policy Oded Gabbay
2023-02-12 20:44 ` [PATCH 18/27] habanalabs: change user interrupt to threaded IRQ Oded Gabbay
2023-02-16 10:28   ` Stanislaw Gruszka
2023-02-16 13:47     ` Oded Gabbay
2023-02-16 14:29       ` Stanislaw Gruszka
2023-02-16 10:39   ` Stanislaw Gruszka
2023-02-16 13:49     ` Oded Gabbay
2023-02-12 20:44 ` [PATCH 19/27] habanalabs: capture interrupt timestamp in handler Oded Gabbay
2023-02-16 14:39   ` Stanislaw Gruszka
2023-02-19 12:42     ` Ofir Bitton
2023-02-12 20:44 ` [PATCH 20/27] habanalabs/gaudi2: add support for TPC assert Oded Gabbay
2023-02-12 20:44 ` [PATCH 21/27] habanalabs: fix print in hl_irq_handler_eq() Oded Gabbay
2023-02-12 20:44 ` [PATCH 22/27] habanalabs: remove hl_irq_handler_default() Oded Gabbay
2023-02-12 20:44 ` [PATCH 23/27] habanalabs: tiny refactor of hl_device_reset for readability Oded Gabbay
2023-02-12 20:44 ` [PATCH 24/27] habanalabs: rename security function parameters Oded Gabbay
2023-02-12 20:44 ` [PATCH 25/27] habanalabs: in hl_device_reset remove 'hard_instead_of_soft' Oded Gabbay
2023-02-12 20:44 ` [PATCH 26/27] habanalabs: in hl_device_reset small refactor for readabilty Oded Gabbay
2023-02-12 20:44 ` [PATCH 27/27] habanalabs: don't trace cpu accessible dma alloc/free Oded Gabbay
2023-02-16 10:53 ` [PATCH 01/27] habanalabs/gaudi2: increase user interrupt grace time Stanislaw Gruszka
2023-02-16 14:24   ` Oded Gabbay
2023-02-18 19:13     ` Ofir Bitton
2023-02-20 15:31 ` Stanislaw Gruszka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230216150447.GJ2849548@linux.intel.com \
    --to=stanislaw.gruszka@linux.intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=ogabbay@kernel.org \
    --cc=ttayar@habana.ai \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.