public inbox for linux-bluetooth@vger.kernel.org
 help / color / mirror / Atom feed
* [Bluez-devel] Hardware Error event patch
@ 2005-03-25 14:09 Catalin Drula
  2005-03-26 11:47 ` Marcel Holtmann
  2005-03-29 17:19 ` Steven Singer
  0 siblings, 2 replies; 5+ messages in thread
From: Catalin Drula @ 2005-03-25 14:09 UTC (permalink / raw)
  To: bluez-devel

Hi Marcel,

I've finished the patch for handling the Hardware Error event and you have
it attached below.

To briefly remind the context: when H4 (HCI over UART) is used
as the transport layer between the host and the Bluetooth controller
and the controller detects a loss of synchronization, it sends a
"Hardware Error" event to the host, which should then send a "Reset"
command for resynchronization. The procedure is described under "Error
Recovery" in the H:4 appendix of Bluetooth v1.1 specification.

The patch mainly follows your suggested steps for resetting the stack
state (a reset of the acl_cnt and sco_cnt was missing).

There is only one thing that's not quite right in that patch: I'm enabling
page and inquiry scanning after the reset. That's because on my hardware
after the reset it disables page and inquiry scanning. The specification
(v1.1) says that after a reset the controller reverts to the default
values of configuration parameters (for Scan_Enable that default value is
"no scans"). I don't think we maintain the state of Scan_Enable in the stack
although we could (of course, we can't do anything if the Write Scan Enable
command is issued directly from userspace with hci_send_cmd). It probably
makes more sense to remove that Write Scan Enable command.

I have tested the patch and it seems to work fine as you can see in the
log below. There's a while loop in the background starting l2test
instances that send to a remote host. (My comments are prepended by
"<<<").

l2test[3762]: Connected [imtu 672, omtu 672, flush_to 65535, handle 1]
l2test[3762]: Sending ...
<<< l2test starts
hci_hardware_error_evt: hci0 Hardware Error event: 1
l2test[3762]: Send failed: Software caused connection abort (103)
<<< hw error occurs
root@h3900:~# hcitool con
Connections:
<<< ACL connection is correctly torn down
root@h3900:~# hciconfig -a
hci0:   Type: UART
        BD Address: 08:00:17:1A:EB:76 ACL MTU: 339:4 SCO MTU: 60:9
        UP RUNNING PSCAN ISCAN
        RX bytes:12461 acl:66 sco:0 events:31 errors:0
        TX bytes:2119 acl:23 sco:0 commands:14 errors:0
        Features: 0xff 0x3b 0x05 0x00 0x00 0x00 0x00 0x00
        Packet type: DM1 DM3 DM5 DH1 DH3 DH5 HV1 HV2 HV3
        Link policy:
        Link mode: SLAVE ACCEPT
        Name: 'POCKET_PC'
        Class: 0x000000
        Service Classes: Unspecified
        Device Class: Miscellaneous,
        HCI Ver: 1.1 (0x1) HCI Rev: 0x180 LMP Ver: 1.1 (0x1) LMP Subver: 0x180
        Manufacturer: RTX Telecom A/S (21)
<<< synchronization is still there (hciconfig -a issues a bunch of
<<< commands to fill in this information)
l2test[3770]: Connected [imtu 672, omtu 672, flush_to 65535, handle 1]
l2test[3770]: Sending ...
<<< second l2test starts
hci_hardware_error_evt: hci0 Hardware Error event: 1
l2test[3770]: Send failed: Software caused connection abort (103)
<<< another hw error event
root@h3900:~#: hcitool con
Connections:
root@h3900:~# cat /proc/bluetooth/l2cap
00:00:00:00:00:00 00:00:00:00:00:00 4 4097 0x0000 0x0000 672 0 0x0
<<< the l2cap connection is torn down (the remaining one is a different
<<< l2test instance that is listening)

The Bluetooth module in the HP PocketPC iPAQ h5550 is very buggy as you
can see. It turns out that going into "no scan" mode improves stability by
quite a lot (instead of hw error events occuring immediately, they occur
after some hours of testing, if at all). In fact, the Widcomm stack under
Windows CE on this machine appears to do two things:

1. Whenever a connection is established it goes into non-discoverable,
non-connectable mode.
2. Whenever a connection is ongoing, it refuses to open a second
connection to another device.

So basically it's limiting the user to one connection at a time.

Regards,

Catalin

diff -ur linux-2.6.11-mh2/include/net/bluetooth/hci.h linux-2.6.11-mh2-hwerr/include/net/bluetooth/hci.h
--- linux-2.6.11-mh2/include/net/bluetooth/hci.h	2005-03-24 13:02:39.000000000 +0100
+++ linux-2.6.11-mh2-hwerr/include/net/bluetooth/hci.h	2005-03-25 14:12:04.566749715 +0100
@@ -584,6 +584,12 @@
 	__u16    clock_offset;
 } __attribute__ ((packed));

+#define HCI_EV_HARDWARE_ERROR	0x10
+struct hci_ev_hardware_error {
+	 __u8     hwcode;
+} __attribute__ ((packed));
+
+
 /* Internal events generated by Bluetooth stack */
 #define HCI_EV_STACK_INTERNAL	0xFD
 struct hci_ev_stack_internal {
diff -ur linux-2.6.11-mh2/net/bluetooth/hci_core.c linux-2.6.11-mh2-hwerr/net/bluetooth/hci_core.c
--- linux-2.6.11-mh2/net/bluetooth/hci_core.c	2005-03-24 13:02:43.000000000 +0100
+++ linux-2.6.11-mh2-hwerr/net/bluetooth/hci_core.c	2005-03-25 14:23:23.891761572 +0100
@@ -646,6 +646,64 @@
 	return ret;
 }

+int hci_dev_reset_hwerr(struct hci_dev *hdev) {
+	int ret = 0;
+	 __u8 scan = 0x03;
+
+	hci_req_lock(hdev);
+
+	/* Disable RX and TX tasks */
+	tasklet_disable(&hdev->rx_task);
+	tasklet_disable(&hdev->tx_task);
+
+	/* Flush connection hash */
+	hci_dev_lock_bh(hdev);
+	hci_conn_hash_flush(hdev);
+	hci_dev_unlock_bh(hdev);
+
+	/* Flush driver */
+	if (hdev->flush)
+		hdev->flush(hdev);
+
+	/* Disable cmd task */
+	tasklet_disable(&hdev->cmd_task);
+
+	/* Drop queues */
+	skb_queue_purge(&hdev->rx_q);
+        skb_queue_purge(&hdev->cmd_q);
+        skb_queue_purge(&hdev->raw_q);
+
+        /* Reset command counter */
+	atomic_set(&hdev->cmd_cnt, 1);
+
+	/* Drop last sent command */
+	if (hdev->sent_cmd) {
+		kfree_skb(hdev->sent_cmd);
+		hdev->sent_cmd = NULL;
+	}
+
+	/* Send reset command */
+	hci_send_cmd(hdev, OGF_HOST_CTL, OCF_RESET, 0, NULL);
+
+	/* Send read buffer size command to reset ACL and SCO counters */
+	hci_send_cmd(hdev, OGF_INFO_PARAM, OCF_READ_BUFFER_SIZE, 0, NULL);
+
+	/* Optional initialization for buggy hardware */
+
+	/* Enable inquiry and page scanning */
+	hci_send_cmd(hdev, OGF_HOST_CTL, OCF_WRITE_SCAN_ENABLE, 1, &scan);
+
+	/* Enable tasks */
+	tasklet_enable(&hdev->rx_task);
+	tasklet_enable(&hdev->tx_task);
+	tasklet_enable(&hdev->cmd_task);
+
+	hci_req_unlock(hdev);
+
+	return ret;
+}
+EXPORT_SYMBOL(hci_dev_reset_hwerr);
+
 int hci_dev_cmd(unsigned int cmd, void __user *arg)
 {
 	struct hci_dev *hdev;
diff -ur linux-2.6.11-mh2/net/bluetooth/hci_event.c linux-2.6.11-mh2-hwerr/net/bluetooth/hci_event.c
--- linux-2.6.11-mh2/net/bluetooth/hci_event.c	2005-03-24 13:08:31.000000000 +0100
+++ linux-2.6.11-mh2-hwerr/net/bluetooth/hci_event.c	2005-03-25 14:16:26.405451648 +0100
@@ -866,6 +866,16 @@
 	hci_dev_unlock(hdev);
 }

+/* Hardware Error */
+static inline void hci_hardware_error_evt(struct hci_dev *hdev, struct sk_buff *skb) {
+	struct hci_ev_hardware_error *ev = (struct hci_ev_hardware_error *) skb->data;
+
+	BT_ERR("%s Hardware Error event: %d", hdev->name, ev->hwcode);
+
+	hci_dev_reset_hwerr(hdev);
+}
+
+
 void hci_event_packet(struct hci_dev *hdev, struct sk_buff *skb)
 {
 	struct hci_event_hdr *hdr = (struct hci_event_hdr *) skb->data;
@@ -938,6 +948,10 @@
 		hci_clock_offset_evt(hdev, skb);
 		break;

+	case HCI_EV_HARDWARE_ERROR:
+		hci_hardware_error_evt(hdev, skb);
+		break;
+
 	case HCI_EV_CMD_STATUS:
 		cs = (struct hci_ev_cmd_status *) skb->data;
 		skb_pull(skb, sizeof(cs));




-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Bluez-devel mailing list
Bluez-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bluez-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Bluez-devel] Hardware Error event patch
  2005-03-25 14:09 [Bluez-devel] Hardware Error event patch Catalin Drula
@ 2005-03-26 11:47 ` Marcel Holtmann
  2005-03-29 17:19 ` Steven Singer
  1 sibling, 0 replies; 5+ messages in thread
From: Marcel Holtmann @ 2005-03-26 11:47 UTC (permalink / raw)
  To: BlueZ Mailing List

Hi Catalin,

> I've finished the patch for handling the Hardware Error event and you have
> it attached below.
> 
> To briefly remind the context: when H4 (HCI over UART) is used
> as the transport layer between the host and the Bluetooth controller
> and the controller detects a loss of synchronization, it sends a
> "Hardware Error" event to the host, which should then send a "Reset"
> command for resynchronization. The procedure is described under "Error
> Recovery" in the H:4 appendix of Bluetooth v1.1 specification.

the EXPORT_SYMBOL is not needed and check the tab versus spaces thing. I
think that also a hci_req_cancel() is needed.

> The patch mainly follows your suggested steps for resetting the stack
> state (a reset of the acl_cnt and sco_cnt was missing).

I don't like to do that via a command. Simply reset them.

> There is only one thing that's not quite right in that patch: I'm enabling
> page and inquiry scanning after the reset. That's because on my hardware
> after the reset it disables page and inquiry scanning. The specification
> (v1.1) says that after a reset the controller reverts to the default
> values of configuration parameters (for Scan_Enable that default value is
> "no scans"). I don't think we maintain the state of Scan_Enable in the stack
> although we could (of course, we can't do anything if the Write Scan Enable
> command is issued directly from userspace with hci_send_cmd). It probably
> makes more sense to remove that Write Scan Enable command.

You will find the current state in hdev->flags. However I am not sure
who should take care of setting it again. Maybe we should send a reset
notification to the userspace.

> The Bluetooth module in the HP PocketPC iPAQ h5550 is very buggy as you
> can see. It turns out that going into "no scan" mode improves stability by
> quite a lot (instead of hw error events occuring immediately, they occur
> after some hours of testing, if at all). In fact, the Widcomm stack under
> Windows CE on this machine appears to do two things:
> 
> 1. Whenever a connection is established it goes into non-discoverable,
> non-connectable mode.
> 2. Whenever a connection is ongoing, it refuses to open a second
> connection to another device.
> 
> So basically it's limiting the user to one connection at a time.

That is a crazy thing to do and actually I think the chip itself is
totally broken if you need to use such procedure.

Regards

Marcel




-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Bluez-devel mailing list
Bluez-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bluez-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Bluez-devel] Hardware Error event patch
  2005-03-25 14:09 [Bluez-devel] Hardware Error event patch Catalin Drula
  2005-03-26 11:47 ` Marcel Holtmann
@ 2005-03-29 17:19 ` Steven Singer
  2005-03-29 17:30   ` Marcel Holtmann
  1 sibling, 1 reply; 5+ messages in thread
From: Steven Singer @ 2005-03-29 17:19 UTC (permalink / raw)
  To: bluez-devel

Catalin Drula wrote:
> I've finished the patch for handling the Hardware Error event and you have
> it attached below.
> 
> To briefly remind the context: when H4 (HCI over UART) is used
> as the transport layer between the host and the Bluetooth controller
> and the controller detects a loss of synchronization, it sends a
> "Hardware Error" event to the host, which should then send a "Reset"
> command for resynchronization. The procedure is described under "Error
> Recovery" in the H:4 appendix of Bluetooth v1.1 specification.

Are you resetting for all hardware error events, or just when you think
that H4 synchronisation has been lost?

It is true that the spec says that a device will issue a hardware error
when synchronisation is lost but it doesn't say that that's the only
reason for a device to issue a hardware error.

CSR devices, for example, use hardware error code 0xFE to mean that H4
synchronisation has been lost. Other hardware error events mean other
things and HCI_Reset is not the appropriate action in all cases. In some
cases no action is required. In other cases user intervention will be
needed to clear the error and we'll emit a hardware error on every boot
until the problem is resolved. A few cases will require a harder reset
than an HCI_Reset.

You probably don't want to reset if you receive a hardware error and
you were not using the H4 host transport.

	- Steven
-- 


**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

**********************************************************************



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Bluez-devel mailing list
Bluez-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bluez-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Bluez-devel] Hardware Error event patch
  2005-03-29 17:19 ` Steven Singer
@ 2005-03-29 17:30   ` Marcel Holtmann
  2005-03-30 19:01     ` [Bluez-devel] " Catalin Drula
  0 siblings, 1 reply; 5+ messages in thread
From: Marcel Holtmann @ 2005-03-29 17:30 UTC (permalink / raw)
  To: BlueZ Mailing List

Hi Steven,

> > I've finished the patch for handling the Hardware Error event and you have
> > it attached below.
> > 
> > To briefly remind the context: when H4 (HCI over UART) is used
> > as the transport layer between the host and the Bluetooth controller
> > and the controller detects a loss of synchronization, it sends a
> > "Hardware Error" event to the host, which should then send a "Reset"
> > command for resynchronization. The procedure is described under "Error
> > Recovery" in the H:4 appendix of Bluetooth v1.1 specification.
> 
> Are you resetting for all hardware error events, or just when you think
> that H4 synchronisation has been lost?
> 
> It is true that the spec says that a device will issue a hardware error
> when synchronisation is lost but it doesn't say that that's the only
> reason for a device to issue a hardware error.
> 
> CSR devices, for example, use hardware error code 0xFE to mean that H4
> synchronisation has been lost. Other hardware error events mean other
> things and HCI_Reset is not the appropriate action in all cases. In some
> cases no action is required. In other cases user intervention will be
> needed to clear the error and we'll emit a hardware error on every boot
> until the problem is resolved. A few cases will require a harder reset
> than an HCI_Reset.
> 
> You probably don't want to reset if you receive a hardware error and
> you were not using the H4 host transport.

thanks for the information. You are making a good point here. However
the error code is another weird vendor specific thing in the Bluetooth
specification. Proposals on how to deal with it are very welcome.

Regards

Marcel




-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Bluez-devel mailing list
Bluez-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bluez-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bluez-devel] Re: Hardware Error event patch
  2005-03-29 17:30   ` Marcel Holtmann
@ 2005-03-30 19:01     ` Catalin Drula
  0 siblings, 0 replies; 5+ messages in thread
From: Catalin Drula @ 2005-03-30 19:01 UTC (permalink / raw)
  To: bluez-devel

Hi Marcel & Steven,

Marcel Holtmann <marcel <at> holtmann.org> writes:
 
> > > I've finished the patch for handling the Hardware Error event and you have
> > > it attached below.
> > > 
> > > To briefly remind the context: when H4 (HCI over UART) is used
> > > as the transport layer between the host and the Bluetooth controller
> > > and the controller detects a loss of synchronization, it sends a
> > > "Hardware Error" event to the host, which should then send a "Reset"
> > > command for resynchronization. The procedure is described under "Error
> > > Recovery" in the H:4 appendix of Bluetooth v1.1 specification.
> > 
> > Are you resetting for all hardware error events, or just when you think
> > that H4 synchronisation has been lost?
> > 
> > It is true that the spec says that a device will issue a hardware error
> > when synchronisation is lost but it doesn't say that that's the only
> > reason for a device to issue a hardware error.
> > 
> > CSR devices, for example, use hardware error code 0xFE to mean that H4
> > synchronisation has been lost. Other hardware error events mean other
> > things and HCI_Reset is not the appropriate action in all cases. In some
> > cases no action is required. In other cases user intervention will be
> > needed to clear the error and we'll emit a hardware error on every boot
> > until the problem is resolved. A few cases will require a harder reset
> > than an HCI_Reset.
> > 
> > You probably don't want to reset if you receive a hardware error and
> > you were not using the H4 host transport.
> 
> thanks for the information. You are making a good point here. However
> the error code is another weird vendor specific thing in the Bluetooth
> specification. Proposals on how to deal with it are very welcome.

Steven is clearly right, but I don't see how we could deal with the
vendor-specific code. The Bluetooth chip in the iPAQ h5550 (RTX Telecom, but in
fact rumour has it that it's a National Semiconductor LMX 9814) uses code 0x01
for H4 loss of synchronization. It would not be feasible to use these
vendor-specific codes, on the one hand because they are not (or not always)
publicly available, and on the other hand it would be overkill to match the
vendor string and hardware error codes anyway.

I would however argue that we do need to take action in case of a loss of
synchronization and that this patch is needed. I agree that this is one of the
things that "should not happen" (the UART should be error free), but it so
happens that so many devices on the market have these loss of sync problems, and
it would drastically improve their useability to have the stack recover properly
from a loss of synchronization. 

I suggest we do what Steven said and only perform our recovery procedure
if H4 is being used. That definitely makes sense. As for the other reasons a
hardware error event might arise (when using H4)... well, first of all, I
suppose that 99.99% of times it is a loss of synchronization causing the event,
and second, in the remaining 0.01%, I doubt sending a reset would hurt. 

By the way, Marcel I'll fix my patch up, according to your suggestions (and with
the modification that we only perform the procedure when H4 is the host
transport), but it will another couple of days.

Regards,

Catalin



-------------------------------------------------------
This SF.net email is sponsored by Demarc:
A global provider of Threat Management Solutions.
Download our HomeAdmin security software for free today!
http://www.demarc.com/Info/Sentarus/hamr30
_______________________________________________
Bluez-devel mailing list
Bluez-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bluez-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-03-30 19:01 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-25 14:09 [Bluez-devel] Hardware Error event patch Catalin Drula
2005-03-26 11:47 ` Marcel Holtmann
2005-03-29 17:19 ` Steven Singer
2005-03-29 17:30   ` Marcel Holtmann
2005-03-30 19:01     ` [Bluez-devel] " Catalin Drula

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox