* [PATCH] habanalabs: print to kernel log when reset is finished
@ 2019-08-10 12:38 Oded Gabbay
2019-08-10 12:53 ` Greg KH
0 siblings, 1 reply; 7+ messages in thread
From: Oded Gabbay @ 2019-08-10 12:38 UTC (permalink / raw)
To: linux-kernel, oshpigelman, ttayar; +Cc: gregkh
Now that we don't print the queue testing messages, we need to print when
the reset is finished so whoever looks at the kernel log will know the
reset process was finished successfully and the driver is not stuck.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
drivers/misc/habanalabs/device.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/misc/habanalabs/device.c b/drivers/misc/habanalabs/device.c
index 9a5926888b99..1fac808c2546 100644
--- a/drivers/misc/habanalabs/device.c
+++ b/drivers/misc/habanalabs/device.c
@@ -907,6 +907,8 @@ int hl_device_reset(struct hl_device *hdev, bool hard_reset,
else
hdev->soft_reset_cnt++;
+ dev_info(hdev->dev, "Successfully finished resetting the device\n");
+
return 0;
out_err:
--
2.17.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] habanalabs: print to kernel log when reset is finished
2019-08-10 12:38 [PATCH] habanalabs: print to kernel log when reset is finished Oded Gabbay
@ 2019-08-10 12:53 ` Greg KH
2019-08-10 15:29 ` Oded Gabbay
0 siblings, 1 reply; 7+ messages in thread
From: Greg KH @ 2019-08-10 12:53 UTC (permalink / raw)
To: Oded Gabbay; +Cc: linux-kernel, oshpigelman, ttayar
On Sat, Aug 10, 2019 at 03:38:08PM +0300, Oded Gabbay wrote:
> Now that we don't print the queue testing messages, we need to print when
> the reset is finished so whoever looks at the kernel log will know the
> reset process was finished successfully and the driver is not stuck.
>
> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
> ---
> drivers/misc/habanalabs/device.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/misc/habanalabs/device.c b/drivers/misc/habanalabs/device.c
> index 9a5926888b99..1fac808c2546 100644
> --- a/drivers/misc/habanalabs/device.c
> +++ b/drivers/misc/habanalabs/device.c
> @@ -907,6 +907,8 @@ int hl_device_reset(struct hl_device *hdev, bool hard_reset,
> else
> hdev->soft_reset_cnt++;
>
> + dev_info(hdev->dev, "Successfully finished resetting the device\n");
Really? For doing things "properly" there is no need to spam the kernel
log. Only spit stuff out if an error happens.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] habanalabs: print to kernel log when reset is finished
2019-08-10 12:53 ` Greg KH
@ 2019-08-10 15:29 ` Oded Gabbay
2019-08-10 17:23 ` Greg KH
0 siblings, 1 reply; 7+ messages in thread
From: Oded Gabbay @ 2019-08-10 15:29 UTC (permalink / raw)
To: Greg KH; +Cc: Linux-Kernel@Vger. Kernel. Org, Omer Shpigelman, Tomer Tayar
On Sat, Aug 10, 2019 at 3:53 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Sat, Aug 10, 2019 at 03:38:08PM +0300, Oded Gabbay wrote:
> > Now that we don't print the queue testing messages, we need to print when
> > the reset is finished so whoever looks at the kernel log will know the
> > reset process was finished successfully and the driver is not stuck.
> >
> > Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
> > ---
> > drivers/misc/habanalabs/device.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/misc/habanalabs/device.c b/drivers/misc/habanalabs/device.c
> > index 9a5926888b99..1fac808c2546 100644
> > --- a/drivers/misc/habanalabs/device.c
> > +++ b/drivers/misc/habanalabs/device.c
> > @@ -907,6 +907,8 @@ int hl_device_reset(struct hl_device *hdev, bool hard_reset,
> > else
> > hdev->soft_reset_cnt++;
> >
> > + dev_info(hdev->dev, "Successfully finished resetting the device\n");
>
> Really? For doing things "properly" there is no need to spam the kernel
> log. Only spit stuff out if an error happens.
>
> thanks,
>
> greg k-h
I beg to differ for two reasons:
1. Reset happens very rarely, if at all. So this message (that get
printed after reset is done) will definitely not spam the kernel log.
2. When a reset starts we print an appropriate error message. I think
it is expected by the user that we will also print if and when the
reset has finished successfully. I really believe that lack of this
printing might be deceiving for users.
Thanks,
Oded
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] habanalabs: print to kernel log when reset is finished
2019-08-10 15:29 ` Oded Gabbay
@ 2019-08-10 17:23 ` Greg KH
2019-08-10 17:45 ` Oded Gabbay
0 siblings, 1 reply; 7+ messages in thread
From: Greg KH @ 2019-08-10 17:23 UTC (permalink / raw)
To: Oded Gabbay; +Cc: Linux-Kernel@Vger. Kernel. Org, Omer Shpigelman, Tomer Tayar
On Sat, Aug 10, 2019 at 06:29:16PM +0300, Oded Gabbay wrote:
> On Sat, Aug 10, 2019 at 3:53 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Sat, Aug 10, 2019 at 03:38:08PM +0300, Oded Gabbay wrote:
> > > Now that we don't print the queue testing messages, we need to print when
> > > the reset is finished so whoever looks at the kernel log will know the
> > > reset process was finished successfully and the driver is not stuck.
> > >
> > > Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
> > > ---
> > > drivers/misc/habanalabs/device.c | 2 ++
> > > 1 file changed, 2 insertions(+)
> > >
> > > diff --git a/drivers/misc/habanalabs/device.c b/drivers/misc/habanalabs/device.c
> > > index 9a5926888b99..1fac808c2546 100644
> > > --- a/drivers/misc/habanalabs/device.c
> > > +++ b/drivers/misc/habanalabs/device.c
> > > @@ -907,6 +907,8 @@ int hl_device_reset(struct hl_device *hdev, bool hard_reset,
> > > else
> > > hdev->soft_reset_cnt++;
> > >
> > > + dev_info(hdev->dev, "Successfully finished resetting the device\n");
> >
> > Really? For doing things "properly" there is no need to spam the kernel
> > log. Only spit stuff out if an error happens.
> >
> > thanks,
> >
> > greg k-h
>
> I beg to differ for two reasons:
> 1. Reset happens very rarely, if at all. So this message (that get
> printed after reset is done) will definitely not spam the kernel log.
> 2. When a reset starts we print an appropriate error message. I think
> it is expected by the user that we will also print if and when the
> reset has finished successfully. I really believe that lack of this
> printing might be deceiving for users.
How is anyone going to parse the kernel log for anything to know if
something happens?
How do you trigger a reset? Is it done by userspace? If so, just
notify them then.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] habanalabs: print to kernel log when reset is finished
2019-08-10 17:23 ` Greg KH
@ 2019-08-10 17:45 ` Oded Gabbay
2019-08-11 7:38 ` Greg KH
0 siblings, 1 reply; 7+ messages in thread
From: Oded Gabbay @ 2019-08-10 17:45 UTC (permalink / raw)
To: Greg KH; +Cc: Linux-Kernel@Vger. Kernel. Org, Omer Shpigelman, Tomer Tayar
On Sat, Aug 10, 2019 at 8:23 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Sat, Aug 10, 2019 at 06:29:16PM +0300, Oded Gabbay wrote:
> > On Sat, Aug 10, 2019 at 3:53 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Sat, Aug 10, 2019 at 03:38:08PM +0300, Oded Gabbay wrote:
> > > > Now that we don't print the queue testing messages, we need to print when
> > > > the reset is finished so whoever looks at the kernel log will know the
> > > > reset process was finished successfully and the driver is not stuck.
> > > >
> > > > Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
> > > > ---
> > > > drivers/misc/habanalabs/device.c | 2 ++
> > > > 1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/drivers/misc/habanalabs/device.c b/drivers/misc/habanalabs/device.c
> > > > index 9a5926888b99..1fac808c2546 100644
> > > > --- a/drivers/misc/habanalabs/device.c
> > > > +++ b/drivers/misc/habanalabs/device.c
> > > > @@ -907,6 +907,8 @@ int hl_device_reset(struct hl_device *hdev, bool hard_reset,
> > > > else
> > > > hdev->soft_reset_cnt++;
> > > >
> > > > + dev_info(hdev->dev, "Successfully finished resetting the device\n");
> > >
> > > Really? For doing things "properly" there is no need to spam the kernel
> > > log. Only spit stuff out if an error happens.
> > >
> > > thanks,
> > >
> > > greg k-h
> >
> > I beg to differ for two reasons:
> > 1. Reset happens very rarely, if at all. So this message (that get
> > printed after reset is done) will definitely not spam the kernel log.
> > 2. When a reset starts we print an appropriate error message. I think
> > it is expected by the user that we will also print if and when the
> > reset has finished successfully. I really believe that lack of this
> > printing might be deceiving for users.
>
> How is anyone going to parse the kernel log for anything to know if
> something happens?
Actually our jenkins server parse the kernel log and look for bad
messages from the driver...
But the main reason for this, IMO, is for developers (in userspace)
that run with a terminal open on the kernel log to see if something
bad happens while they run their applications.
>
> How do you trigger a reset? Is it done by userspace? If so, just
> notify them then.
Reset can be triggered by two main reasons, but a userspace
application is *not* one of them.
It can be due to a command submission not finishing in time or from a
major error in the device (e.g. double ECC error).
That's why I think it is important to tell them the reset +
re-initialization flow was finished successfully.
Of course all this information, let's call it operational status, can
be acquired by the INFO IOCTL the driver provides, but it is *really*
convenient, IMHO, to see these type of messages in the kernel log
while they happen.
>
> thanks,
>
> greg k-h
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] habanalabs: print to kernel log when reset is finished
2019-08-10 17:45 ` Oded Gabbay
@ 2019-08-11 7:38 ` Greg KH
2019-08-11 7:40 ` Oded Gabbay
0 siblings, 1 reply; 7+ messages in thread
From: Greg KH @ 2019-08-11 7:38 UTC (permalink / raw)
To: Oded Gabbay; +Cc: Linux-Kernel@Vger. Kernel. Org, Omer Shpigelman, Tomer Tayar
On Sat, Aug 10, 2019 at 08:45:54PM +0300, Oded Gabbay wrote:
> On Sat, Aug 10, 2019 at 8:23 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Sat, Aug 10, 2019 at 06:29:16PM +0300, Oded Gabbay wrote:
> > > On Sat, Aug 10, 2019 at 3:53 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> > > >
> > > > On Sat, Aug 10, 2019 at 03:38:08PM +0300, Oded Gabbay wrote:
> > > > > Now that we don't print the queue testing messages, we need to print when
> > > > > the reset is finished so whoever looks at the kernel log will know the
> > > > > reset process was finished successfully and the driver is not stuck.
> > > > >
> > > > > Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
> > > > > ---
> > > > > drivers/misc/habanalabs/device.c | 2 ++
> > > > > 1 file changed, 2 insertions(+)
> > > > >
> > > > > diff --git a/drivers/misc/habanalabs/device.c b/drivers/misc/habanalabs/device.c
> > > > > index 9a5926888b99..1fac808c2546 100644
> > > > > --- a/drivers/misc/habanalabs/device.c
> > > > > +++ b/drivers/misc/habanalabs/device.c
> > > > > @@ -907,6 +907,8 @@ int hl_device_reset(struct hl_device *hdev, bool hard_reset,
> > > > > else
> > > > > hdev->soft_reset_cnt++;
> > > > >
> > > > > + dev_info(hdev->dev, "Successfully finished resetting the device\n");
> > > >
> > > > Really? For doing things "properly" there is no need to spam the kernel
> > > > log. Only spit stuff out if an error happens.
> > > >
> > > > thanks,
> > > >
> > > > greg k-h
> > >
> > > I beg to differ for two reasons:
> > > 1. Reset happens very rarely, if at all. So this message (that get
> > > printed after reset is done) will definitely not spam the kernel log.
> > > 2. When a reset starts we print an appropriate error message. I think
> > > it is expected by the user that we will also print if and when the
> > > reset has finished successfully. I really believe that lack of this
> > > printing might be deceiving for users.
> >
> > How is anyone going to parse the kernel log for anything to know if
> > something happens?
>
> Actually our jenkins server parse the kernel log and look for bad
> messages from the driver...
> But the main reason for this, IMO, is for developers (in userspace)
> that run with a terminal open on the kernel log to see if something
> bad happens while they run their applications.
>
> >
> > How do you trigger a reset? Is it done by userspace? If so, just
> > notify them then.
>
> Reset can be triggered by two main reasons, but a userspace
> application is *not* one of them.
> It can be due to a command submission not finishing in time or from a
> major error in the device (e.g. double ECC error).
> That's why I think it is important to tell them the reset +
> re-initialization flow was finished successfully.
> Of course all this information, let's call it operational status, can
> be acquired by the INFO IOCTL the driver provides, but it is *really*
> convenient, IMHO, to see these type of messages in the kernel log
> while they happen.
Ah, yes, if this is something that userspace can not trigger then it's
ok. But you should then make it a 'dev_warn' to match up with the
message level when the reset starts.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] habanalabs: print to kernel log when reset is finished
2019-08-11 7:38 ` Greg KH
@ 2019-08-11 7:40 ` Oded Gabbay
0 siblings, 0 replies; 7+ messages in thread
From: Oded Gabbay @ 2019-08-11 7:40 UTC (permalink / raw)
To: Greg KH; +Cc: Linux-Kernel@Vger. Kernel. Org, Omer Shpigelman, Tomer Tayar
On Sun, Aug 11, 2019 at 10:38 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Sat, Aug 10, 2019 at 08:45:54PM +0300, Oded Gabbay wrote:
> > On Sat, Aug 10, 2019 at 8:23 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Sat, Aug 10, 2019 at 06:29:16PM +0300, Oded Gabbay wrote:
> > > > On Sat, Aug 10, 2019 at 3:53 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> > > > >
> > > > > On Sat, Aug 10, 2019 at 03:38:08PM +0300, Oded Gabbay wrote:
> > > > > > Now that we don't print the queue testing messages, we need to print when
> > > > > > the reset is finished so whoever looks at the kernel log will know the
> > > > > > reset process was finished successfully and the driver is not stuck.
> > > > > >
> > > > > > Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
> > > > > > ---
> > > > > > drivers/misc/habanalabs/device.c | 2 ++
> > > > > > 1 file changed, 2 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/misc/habanalabs/device.c b/drivers/misc/habanalabs/device.c
> > > > > > index 9a5926888b99..1fac808c2546 100644
> > > > > > --- a/drivers/misc/habanalabs/device.c
> > > > > > +++ b/drivers/misc/habanalabs/device.c
> > > > > > @@ -907,6 +907,8 @@ int hl_device_reset(struct hl_device *hdev, bool hard_reset,
> > > > > > else
> > > > > > hdev->soft_reset_cnt++;
> > > > > >
> > > > > > + dev_info(hdev->dev, "Successfully finished resetting the device\n");
> > > > >
> > > > > Really? For doing things "properly" there is no need to spam the kernel
> > > > > log. Only spit stuff out if an error happens.
> > > > >
> > > > > thanks,
> > > > >
> > > > > greg k-h
> > > >
> > > > I beg to differ for two reasons:
> > > > 1. Reset happens very rarely, if at all. So this message (that get
> > > > printed after reset is done) will definitely not spam the kernel log.
> > > > 2. When a reset starts we print an appropriate error message. I think
> > > > it is expected by the user that we will also print if and when the
> > > > reset has finished successfully. I really believe that lack of this
> > > > printing might be deceiving for users.
> > >
> > > How is anyone going to parse the kernel log for anything to know if
> > > something happens?
> >
> > Actually our jenkins server parse the kernel log and look for bad
> > messages from the driver...
> > But the main reason for this, IMO, is for developers (in userspace)
> > that run with a terminal open on the kernel log to see if something
> > bad happens while they run their applications.
> >
> > >
> > > How do you trigger a reset? Is it done by userspace? If so, just
> > > notify them then.
> >
> > Reset can be triggered by two main reasons, but a userspace
> > application is *not* one of them.
> > It can be due to a command submission not finishing in time or from a
> > major error in the device (e.g. double ECC error).
> > That's why I think it is important to tell them the reset +
> > re-initialization flow was finished successfully.
> > Of course all this information, let's call it operational status, can
> > be acquired by the INFO IOCTL the driver provides, but it is *really*
> > convenient, IMHO, to see these type of messages in the kernel log
> > while they happen.
>
> Ah, yes, if this is something that userspace can not trigger then it's
> ok. But you should then make it a 'dev_warn' to match up with the
> message level when the reset starts.
>
> thanks,
>
> greg k-h
Sure, np.
Thanks,
Oded
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-08-11 7:40 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-08-10 12:38 [PATCH] habanalabs: print to kernel log when reset is finished Oded Gabbay
2019-08-10 12:53 ` Greg KH
2019-08-10 15:29 ` Oded Gabbay
2019-08-10 17:23 ` Greg KH
2019-08-10 17:45 ` Oded Gabbay
2019-08-11 7:38 ` Greg KH
2019-08-11 7:40 ` Oded Gabbay
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox