public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [patch netdrvr sis900] net: come alive after temporary memory shortage
@ 2005-09-26 12:19 Konstantin Khorenko
  2005-09-26 13:06 ` Daniele Venzano
  0 siblings, 1 reply; 3+ messages in thread
From: Konstantin Khorenko @ 2005-09-26 12:19 UTC (permalink / raw)
  To: Daniele Venzano
  Cc: Vasily Averin, Stanislav Protassov, Ollie Lho, linux-net,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4123 bytes --]


Patch solves following problems:
1) Forgotten counter incrementation in sis900_rx() in case
     it doesn't get memory for skb, that leads to whole interface failure.
     Problem is accompanied with messages:
    eth0: Memory squeeze,deferring packet.
    eth0: NULL pointer encountered in Rx ring, skipping
2) If counter cur_rx overflows and there'll be temporary memory problems
     buffer can't be recreated later, when memory IS avaliable.
3) Limit the work in handler to prevent the endless packets processing if
     new packets are generated faster then handled.

Signed-off-by: Konstantin Khorenko <khorenko@sw.ru>
Signed-off-by: Vasily Averin <vvs@sw.ru>

-----------------------------------

We had a customer that complains about the problem with network card
that is supported by sis900 driver.
Problem description: at random time card suddenly stops working and only
reboot makes it back to work.
Non-working is accomplished with massages in /var/log/messages:
    eth0: Memory squeeze,deferring packet.
    eth0: NULL pointer encountered in Rx ring, skipping
    eth0: NULL pointer encountered in Rx ring, skipping
    eth0: NULL pointer encountered in Rx ring, skipping
(till reboot)

We discover that his problem is already known:
http://www.ussg.iu.edu/hypermail/linux/kernel/0407.3/0566.html
http://www.kernelnewbies.org/documents/kdoc/sis900/problems.html

Nevertheless it isn't fixed till now, so we tried to fix.

(1) Function sis900_rx().
During normal execution dirty_rx < cur_rx is ALWAYS true.
Let's assume, we are short of memory.

	unsigned int entry = sis_priv->cur_rx % NUM_RX_DESC;
	...
	while (rx_status & OWN) {
		...
		if (some error check) {
			...
		} else {
5. Next func call, after previous one we have rx_skbuff[cur_rx%] == NULL,
      which means rx_skbuff[entry] == NULL

			if (sis_priv->rx_skbuff[entry] == NULL) {
				printk(KERN_INFO "%s: NULL pointer "
					"encountered in Rx ring, skipping\n",
					net_dev->name);
6. Print and exit while() loop.
				break;
			 }
			...
1. fail here. -->	if ((skb = dev_alloc_skb(RX_BUF_SIZE)) == NULL) {
				...
				printk(KERN_INFO "%s: Memory squeeze,"
				       "deferring packet.\n",
				       net_dev->name);
-->				sis_priv->rx_skbuff[entry] = NULL;
2. now sis_priv->rx_skbuff[cur_rx%] == NULL
				...
				break;
3. and we are exiting while () not incrementing cur_rx!
			}
			...
		} // of else
		sis_priv->cur_rx++;
		entry = sis_priv->cur_rx % NUM_RX_DESC;
		...
	} //of while

4. we refill all buffers rx_skbuff[entry], where entry < cur_rx.
      rx_skbuff[cur_rx%] == NULL before and AFTER loop

	for (; sis_priv->cur_rx > sis_priv->dirty_rx; sis_priv->dirty_rx++) {
		entry = sis_priv->dirty_rx % NUM_RX_DESC;
		if (sis_priv->rx_skbuff[entry] == NULL) {
			...
			sis_priv->rx_skbuff[entry] = skb;
			...
		}
	}

No matter how many times func is called cur_rx won't be incremented, and
thus
rx_skbuff[cur_rx%] will be NULL forever, which results neverending
printings and packets drops.
------

(2) The same function sis900_rx().

	for (; sis_priv->cur_rx > sis_priv->dirty_rx; sis_priv->dirty_rx++) {
		entry = sis_priv->dirty_rx % NUM_RX_DESC;
		if (sis_priv->rx_skbuff[entry] == NULL) {
			...skb = dev_alloc_skb(RX_BUF_SIZE)...
			...
			sis_priv->rx_skbuff[entry] = skb;
			...
		}
	}

Assume cur_rx is overflowed in previous while() loop execution, but
dirty_rx is NOT and we really need buffer refilling.
Comparison sis_priv->cur_rx > sis_priv->dirty_rx will fail and buffers
won't be refilled.
----------

(3) The same function sis900_rx().
Assume whole buffer is filled, there is no memory shortage problem and
network card receives packets faster then kernel process them in this
sis900_rx() function in while (rx_status & OWN) loop - execution control
won't leave the loop.
sis900_rx() is called in interrupt handler, it's not good idea to `do
"too much" work here` (sentence from sources :) )

----------
Hope, you'll check this changes and find them usefull. :)
Kernels with patches compile but untested.
This patch is against mainstream 2.6.13.1 kernel.

-- 
Best regards,

Konstantin Khorenko,
SWsoft, Inc.


[-- Attachment #2: diff-sis900-2.6.13.1 --]
[-- Type: text/plain, Size: 2298 bytes --]

--- ./drivers/net/sis900.c.sis900	2005-08-29 03:41:01.000000000 +0400
+++ ./drivers/net/sis900.c	2005-09-19 14:34:42.000000000 +0400
@@ -1696,6 +1696,14 @@ static int sis900_rx(struct net_device *
 	long ioaddr = net_dev->base_addr;
 	unsigned int entry = sis_priv->cur_rx % NUM_RX_DESC;
 	u32 rx_status = sis_priv->rx_ring[entry].cmdsts;
+	/*
+	 * If cur > dirty, then limit = NUM_RX_DESC - cur + dirty =
+	 *				NUM_RX_DESC + (dirty - cur)
+	 * If cur < dirty (cur overflowed, dirty - not), then
+	 *			limit = dirty - cur
+	 */
+	int rx_work_limit =
+		(sis_priv->dirty_rx - sis_priv->cur_rx) % NUM_RX_DESC;
 
 	if (netif_msg_rx_status(sis_priv))
 		printk(KERN_DEBUG "sis900_rx, cur_rx:%4.4d, dirty_rx:%4.4d "
@@ -1705,6 +1713,8 @@ static int sis900_rx(struct net_device *
 	while (rx_status & OWN) {
 		unsigned int rx_size;
 
+		if (--rx_work_limit < 0)
+			break;
 		rx_size = (rx_status & DSIZE) - CRC_SIZE;
 
 		if (rx_status & (ABORT|OVERRUN|TOOLONG|RUNT|RXISERR|CRCERR|FAERR)) {
@@ -1770,6 +1780,7 @@ static int sis900_rx(struct net_device *
 				sis_priv->rx_ring[entry].cmdsts = 0;
 				sis_priv->rx_ring[entry].bufptr = 0;
 				sis_priv->stats.rx_dropped++;
+				sis_priv->cur_rx++;
 				break;
 			}
 			skb->dev = net_dev;
@@ -1787,7 +1798,7 @@ static int sis900_rx(struct net_device *
 
 	/* refill the Rx buffer, what if the rate of refilling is slower
 	 * than consuming ?? */
-	for (;sis_priv->cur_rx - sis_priv->dirty_rx > 0; sis_priv->dirty_rx++) {
+	for (; sis_priv->cur_rx != sis_priv->dirty_rx; sis_priv->dirty_rx++) {
 		struct sk_buff *skb;
 
 		entry = sis_priv->dirty_rx % NUM_RX_DESC;
#
# Patch solves following problems:
# 1) Forgotten counter incrementation in sis900_rx() in case
#    it doesn't get memory for skb, that leads to whole interface failure.
#    Problem is accompanied with messages:
#   eth0: Memory squeeze,deferring packet.
#   eth0: NULL pointer encountered in Rx ring, skipping
# 2) If counter cur_rx overflows and there'll be temporary memory problems
#    buffer can't be recreated later, when memory IS avaliable.
# 3) Limit the work in handler to prevent the endless packets processing if
#    new packets are generated faster then handled.
#
# Signed-off-by: Konstantin Khorenko <khorenko@sw.ru>
# Signed-off-by: Vasily Averin <vvs@sw.ru>


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [patch netdrvr sis900] net: come alive after temporary memory shortage
@ 2005-09-26 12:26 Konstantin Khorenko
  0 siblings, 0 replies; 3+ messages in thread
From: Konstantin Khorenko @ 2005-09-26 12:26 UTC (permalink / raw)
  To: Daniele Venzano
  Cc: Vasily Averin, Stanislav Protassov, linux-net, linux-kernel,
	marcelo

[-- Attachment #1: Type: text/plain, Size: 4121 bytes --]


Patch solves following problems:
1) Forgotten counter incrementation in sis900_rx() in case
     it doesn't get memory for skb, that leads to whole interface failure.
     Problem is accompanied with messages:
    eth0: Memory squeeze,deferring packet.
    eth0: NULL pointer encountered in Rx ring, skipping
2) If counter cur_rx overflows and there'll be temporary memory problems
     buffer can't be recreated later, when memory IS avaliable.
3) Limit the work in handler to prevent the endless packets processing if
     new packets are generated faster then handled.

Signed-off-by: Konstantin Khorenko <khorenko@sw.ru>
Signed-off-by: Vasily Averin <vvs@sw.ru>

-----------------------------------

We had a customer that complains about the problem with network card
that is supported by sis900 driver.
Problem description: at random time card suddenly stops working and only
reboot makes it back to work.
Non-working is accomplished with massages in /var/log/messages:
    eth0: Memory squeeze,deferring packet.
    eth0: NULL pointer encountered in Rx ring, skipping
    eth0: NULL pointer encountered in Rx ring, skipping
    eth0: NULL pointer encountered in Rx ring, skipping
(till reboot)

We discover that his problem is already known:
http://www.ussg.iu.edu/hypermail/linux/kernel/0407.3/0566.html
http://www.kernelnewbies.org/documents/kdoc/sis900/problems.html

Nevertheless it isn't fixed till now, so we tried to fix.

(1) Function sis900_rx().
During normal execution dirty_rx < cur_rx is ALWAYS true.
Let's assume, we are short of memory.

	unsigned int entry = sis_priv->cur_rx % NUM_RX_DESC;
	...
	while (rx_status & OWN) {
		...
		if (some error check) {
			...
		} else {
5. Next func call, after previous one we have rx_skbuff[cur_rx%] == NULL,
      which means rx_skbuff[entry] == NULL

			if (sis_priv->rx_skbuff[entry] == NULL) {
				printk(KERN_INFO "%s: NULL pointer "
					"encountered in Rx ring, skipping\n",
					net_dev->name);
6. Print and exit while() loop.
				break;
			 }
			...
1. fail here. -->	if ((skb = dev_alloc_skb(RX_BUF_SIZE)) == NULL) {
				...
				printk(KERN_INFO "%s: Memory squeeze,"
				       "deferring packet.\n",
				       net_dev->name);
-->				sis_priv->rx_skbuff[entry] = NULL;
2. now sis_priv->rx_skbuff[cur_rx%] == NULL
				...
				break;
3. and we are exiting while () not incrementing cur_rx!
			}
			...
		} // of else
		sis_priv->cur_rx++;
		entry = sis_priv->cur_rx % NUM_RX_DESC;
		...
	} //of while

4. we refill all buffers rx_skbuff[entry], where entry < cur_rx.
      rx_skbuff[cur_rx%] == NULL before and AFTER loop

	for (; sis_priv->cur_rx > sis_priv->dirty_rx; sis_priv->dirty_rx++) {
		entry = sis_priv->dirty_rx % NUM_RX_DESC;
		if (sis_priv->rx_skbuff[entry] == NULL) {
			...
			sis_priv->rx_skbuff[entry] = skb;
			...
		}
	}

No matter how many times func is called cur_rx won't be incremented, and
thus
rx_skbuff[cur_rx%] will be NULL forever, which results neverending
printings and packets drops.
------

(2) The same function sis900_rx().

	for (; sis_priv->cur_rx > sis_priv->dirty_rx; sis_priv->dirty_rx++) {
		entry = sis_priv->dirty_rx % NUM_RX_DESC;
		if (sis_priv->rx_skbuff[entry] == NULL) {
			...skb = dev_alloc_skb(RX_BUF_SIZE)...
			...
			sis_priv->rx_skbuff[entry] = skb;
			...
		}
	}

Assume cur_rx is overflowed in previous while() loop execution, but
dirty_rx is NOT and we really need buffer refilling.
Comparison sis_priv->cur_rx > sis_priv->dirty_rx will fail and buffers
won't be refilled.
----------

(3) The same function sis900_rx().
Assume whole buffer is filled, there is no memory shortage problem and
network card receives packets faster then kernel process them in this
sis900_rx() function in while (rx_status & OWN) loop - execution control
won't leave the loop.
sis900_rx() is called in interrupt handler, it's not good idea to `do
"too much" work here` (sentence from sources :) )

----------
Hope, you'll check this changes and find them usefull. :)
Kernels with patches compile but untested.
This patch is against mainstream 2.4.31 kernel.

-- 
Best regards,

Konstantin Khorenko,
SWsoft, Inc.


[-- Attachment #2: diff-sis900-2.4.31 --]
[-- Type: text/plain, Size: 2285 bytes --]

--- ./drivers/net/sis900.c.sis900	2004-08-08 03:26:05.000000000 +0400
+++ ./drivers/net/sis900.c	2005-09-16 17:17:27.000000000 +0400
@@ -1613,6 +1613,14 @@ static int sis900_rx(struct net_device *
 	long ioaddr = net_dev->base_addr;
 	unsigned int entry = sis_priv->cur_rx % NUM_RX_DESC;
 	u32 rx_status = sis_priv->rx_ring[entry].cmdsts;
+	/*
+	 * If cur > dirty, then limit = NUM_RX_DESC - cur + dirty =
+	 *				NUM_RX_DESC + (dirty - cur)
+	 * If cur < dirty (cur overflowed, dirty - not), then
+	 *			limit = dirty - cur
+	 */
+	int rx_work_limit =
+		(sis_priv->dirty_rx - sis_priv->cur_rx) % NUM_RX_DESC;
 
 	if (sis900_debug > 3)
 		printk(KERN_INFO "sis900_rx, cur_rx:%4.4d, dirty_rx:%4.4d "
@@ -1622,6 +1630,8 @@ static int sis900_rx(struct net_device *
 	while (rx_status & OWN) {
 		unsigned int rx_size;
 
+		if (--rx_work_limit < 0)
+			break;
 		rx_size = (rx_status & DSIZE) - CRC_SIZE;
 
 		if (rx_status & (ABORT|OVERRUN|TOOLONG|RUNT|RXISERR|CRCERR|FAERR)) {
@@ -1688,6 +1698,7 @@ static int sis900_rx(struct net_device *
 				sis_priv->rx_ring[entry].cmdsts = 0;
 				sis_priv->rx_ring[entry].bufptr = 0;
 				sis_priv->stats.rx_dropped++;
+				sis_priv->cur_rx++;
 				break;
 			}
 			skb->dev = net_dev;
@@ -1705,7 +1716,7 @@ static int sis900_rx(struct net_device *
 
 	/* refill the Rx buffer, what if the rate of refilling is slower than 
 	   consuming ?? */
-	for (;sis_priv->cur_rx - sis_priv->dirty_rx > 0; sis_priv->dirty_rx++) {
+	for (; sis_priv->cur_rx != sis_priv->dirty_rx; sis_priv->dirty_rx++) {
 		struct sk_buff *skb;
 
 		entry = sis_priv->dirty_rx % NUM_RX_DESC;
#
# Patch solves following problems:
# 1) Forgotten counter incrementation in sis900_rx() in case
#    it doesn't get memory for skb, that leads to whole interface failure.
#    Problem is accompanied with messages:
#   eth0: Memory squeeze,deferring packet.
#   eth0: NULL pointer encountered in Rx ring, skipping
# 2) If counter cur_rx overflows and there'll be temporary memory problems
#    buffer can't be recreated later, when memory IS avaliable.
# 3) Limit the work in handler to prevent the endless packets processing if
#    new packets are generated faster then handled.
#
# Signed-off-by: Konstantin Khorenko <khorenko@sw.ru>
# Signed-off-by: Vasily Averin <vvs@sw.ru>


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [patch netdrvr sis900] net: come alive after temporary memory shortage
  2005-09-26 12:19 Konstantin Khorenko
@ 2005-09-26 13:06 ` Daniele Venzano
  0 siblings, 0 replies; 3+ messages in thread
From: Daniele Venzano @ 2005-09-26 13:06 UTC (permalink / raw)
  To: Konstantin Khorenko
  Cc: Vasily Averin, Stanislav Protassov, linux-net,
	Linux Kernel Mailing List

Il giorno 26/set/05, alle ore 14:19, Konstantin Khorenko ha scritto:
> Hope, you'll check this changes and find them usefull. :)
> Kernels with patches compile but untested.
> This patch is against mainstream 2.6.13.1 kernel.

> --- ./drivers/net/sis900.c.sis900    2005-08-29 03:41:01.000000000  
> +0400
> +++ ./drivers/net/sis900.c    2005-09-19 14:34:42.000000000 +0400

Please create the diff one directory above the root sources directory  
so that it is possible to apply with 'patch -p1'.

> @@ -1696,6 +1696,14 @@ static int sis900_rx(struct net_device *
>      long ioaddr = net_dev->base_addr;
>      unsigned int entry = sis_priv->cur_rx % NUM_RX_DESC;
>      u32 rx_status = sis_priv->rx_ring[entry].cmdsts;
> +    /*
> +     * If cur > dirty, then limit = NUM_RX_DESC - cur + dirty =
> +     *                NUM_RX_DESC + (dirty - cur)
> +     * If cur < dirty (cur overflowed, dirty - not), then
> +     *            limit = dirty - cur
> +     */
> +    int rx_work_limit =
> +        (sis_priv->dirty_rx - sis_priv->cur_rx) % NUM_RX_DESC;
Remove this comment, or move it to the description of the function  
above the sis900_rx() declaration.

>
>      if (netif_msg_rx_status(sis_priv))
>          printk(KERN_DEBUG "sis900_rx, cur_rx:%4.4d, dirty_rx:%4.4d "
> @@ -1705,6 +1713,8 @@ static int sis900_rx(struct net_device *
>      while (rx_status & OWN) {
>          unsigned int rx_size;
>
> +        if (--rx_work_limit < 0)
> +            break;
>          rx_size = (rx_status & DSIZE) - CRC_SIZE;
>
>          if (rx_status & (ABORT|OVERRUN|TOOLONG|RUNT|RXISERR|CRCERR| 
> FAERR)) {
> @@ -1770,6 +1780,7 @@ static int sis900_rx(struct net_device *
>                  sis_priv->rx_ring[entry].cmdsts = 0;
>                  sis_priv->rx_ring[entry].bufptr = 0;
>                  sis_priv->stats.rx_dropped++;
> +                sis_priv->cur_rx++;
>                  break;
>              }
>              skb->dev = net_dev;
> @@ -1787,7 +1798,7 @@ static int sis900_rx(struct net_device *
>
>      /* refill the Rx buffer, what if the rate of refilling is slower
>       * than consuming ?? */
> -    for (;sis_priv->cur_rx - sis_priv->dirty_rx > 0; sis_priv- 
> >dirty_rx++) {
> +    for (; sis_priv->cur_rx != sis_priv->dirty_rx; sis_priv- 
> >dirty_rx++) {
>          struct sk_buff *skb;
>
>          entry = sis_priv->dirty_rx % NUM_RX_DESC;

With those corrections, the patch should be resent to me, to Jeff  
Garzik and to the netdev mailing list for review and possibly inclusion.
Thanks for the contribution.

--
Daniele Venzano
http://www.brownhat.org


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2005-09-26 13:06 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-26 12:26 [patch netdrvr sis900] net: come alive after temporary memory shortage Konstantin Khorenko
  -- strict thread matches above, loose matches on Subject: below --
2005-09-26 12:19 Konstantin Khorenko
2005-09-26 13:06 ` Daniele Venzano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox