netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: poma <pomidorabelisima@gmail.com>
To: Greg KH <gregkh@linuxfoundation.org>
Cc: Stephen Hemminger <stephen@networkplumber.org>,
	David Miller <davem@davemloft.net>,
	netdev@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH net] skge: dma_sync the whole receive buffer
Date: Wed, 21 Aug 2013 18:04:11 +0200	[thread overview]
Message-ID: <5214E4FB.7010208@gmail.com> (raw)
In-Reply-To: <5212E249.2050203@gmail.com>

On 20.08.2013 05:28, poma wrote:
> On 19.08.2013 02:49, poma wrote:
>> On 15.08.2013 17:41, Stephen Hemminger wrote:
>>> On Wed, 14 Aug 2013 20:29:06 +0200
>>> poma <pomidorabelisima@gmail.com> wrote:
>>>
>>>> On 14.08.2013 18:20, Stephen Hemminger wrote:
>>>>> On Wed, 14 Aug 2013 12:20:03 +0200
>>>>> poma <pomidorabelisima@gmail.com> wrote:
>>>>>
>>>>>> On 14.08.2013 03:00, Stephen Hemminger wrote:
>>>>>>> On Tue, 13 Aug 2013 15:09:55 -0700 (PDT)
>>>>>>> David Miller <davem@davemloft.net> wrote:
>>>>>>>
>>>>>>>> From: Stephen Hemminger <stephen@networkplumber.org>
>>>>>>>> Date: Sat, 10 Aug 2013 15:02:07 -0700
>>>>>>>>
>>>>>>>>> The DMA sync should sync the whole receive buffer, not just
>>>>>>>>> part of it. Fixes log messages dma_sync_check.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>>>>>>>>
>>>>>>>> Applied, but I really suspect that your "check DMA mapping errors"
>>>>>>>> patch has added a serious regression.  A regression much worse than
>>>>>>>> the bug you were trying to fix with that change.
>>>>>>>
>>>>>>> Argh. The problem is deeper than that. Device got broken somewhere between
>>>>>>> 3.2 and 3.4. My old Dlink card works on 3.2 but gets DMA errors on 3.4.
>>>>>>> The config's are different though so checking that as well.
>>>>>>>
>>>>>>
>>>>>> Can I help you with debugging?
>>>>>> DGE-530T is rather solid device.
>>>>>
>>>>> Don't think it is a hardware problem.
>>>>> The failure is when the board access the Receive ring PCI memory area.
>>>>> This region is allocated with pci_alloc_consistent and therefore should
>>>>> be available. Two possible issues are driver math issues, or hardware
>>>>> problems with where the region is located. Some of these cards don't
>>>>> really have full 64 bit PCI support.
>>>>>
>>>>> My board is:
>>>>> 05:01.0 Ethernet controller: D-Link System Inc Gigabit Ethernet Adapter (rev 11)
>>>>> 	Subsystem: D-Link System Inc DGE-530T Gigabit Ethernet Adapter
>>>>> 	Flags: bus master, 66MHz, medium devsel, latency 32, IRQ 18
>>>>> 	Memory at f7d20000 (32-bit, non-prefetchable) [size=16K]
>>>>> 	I/O ports at c000 [size=256]
>>>>> 	Expansion ROM at f7d00000 [disabled] [size=128K]
>>>>> 	Capabilities: [48] Power Management version 2
>>>>> 	Capabilities: [50] Vital Product Data
>>>>> 	Kernel driver in use: skge
>>>>>
>>>>>
>>>>> What is your config?
>>>>>
>>>>
>>>> 01:09.0 Ethernet controller: D-Link System Inc Gigabit Ethernet Adapter
>>>> (rev 11)
>>>> 	Subsystem: D-Link System Inc DGE-530T Gigabit Ethernet Adapter
>>>> 	Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 19
>>>> 	Memory at fbffc000 (32-bit, non-prefetchable) [size=16K]
>>>> 	I/O ports at b400 [size=256]
>>>> 	[virtual] Expansion ROM at ec000000 [disabled] [size=128K]
>>>> 	Capabilities: [48] Power Management version 2
>>>> 	Capabilities: [50] Vital Product Data
>>>> 	Kernel driver in use: skge
>>>>
>>>>
>>>> poma
>>>>
>>>
>>> In the course of debugging this, I moved the card to another slot
>>> and all the problems went away. I suspect either card insertion or more likely
>>> the crap consumer motherboards don't have full PCI support on some slots.
>>>
>>> There doesn't seem to be anyway to address this in software.
>>>
>>
>>
>> DGE-530T is further tested in the 3 available slots:
>> 01:06.0 Ethernet controller: D-Link System Inc Gigabit Ethernet Adapter
>> (rev 11)
>> 01:07.0 Ethernet controller: D-Link System Inc Gigabit Ethernet Adapter
>> (rev 11)
>> 01:08.0 Ethernet controller: D-Link System Inc Gigabit Ethernet Adapter
>> (rev 11)
>> And the result is the same as in the slot:
>> 01:09.0 Ethernet controller: D-Link System Inc Gigabit Ethernet Adapter
>> (rev 11)
>> warnings, oopses and kernel crashes.
>>
>> However DGE-528T(RTL8110s) on the same bus runs without errors:
>> 01:09.0 Ethernet controller: D-Link System Inc DGE-528T Gigabit Ethernet
>> Adapter (rev 10)
>> 	Subsystem: D-Link System Inc DGE-528T Gigabit Ethernet Adapter
>> 	Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 19
>> 	I/O ports at cc00 [size=256]
>> 	Memory at fbfff000 (32-bit, non-prefetchable) [size=256]
>> 	[virtual] Expansion ROM at fbe00000 [disabled] [size=128K]
>> 	Capabilities: [dc] Power Management version 2
>> 	Kernel driver in use: r8169
>>
>> Besides comparing the behavior of these two cards, e.g. NFS upload, I
>> noticed an obvious difference in the data flow.
>> Via DGE-528T transmission is steady, while via DGE-530T the traffic is
>> at times interrupted and unstable.
>> So it seems that the "WARNING: at lib/dma-debug.c:937 check_unmap…"
>> isn't just a fun.
>>
> 
> In support of the validity of the device I made a test with the
> 2.6.32-358.14.1.el6.x86_64.debug kernel.
> And everything worked as it should.
> 
> 01:08.0 Ethernet controller: D-Link System Inc Gigabit Ethernet Adapter
> (rev 11)
> 	Subsystem: D-Link System Inc DGE-530T Gigabit Ethernet Adapter
> 	Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 18
> 	Memory at fbff8000 (32-bit, non-prefetchable) [size=16K]
> 	I/O ports at cc00 [size=256]
> 	[virtual] Expansion ROM at fbe00000 [disabled] [size=128K]
> 	Capabilities: [48] Power Management version 2
> 	Capabilities: [50] Vital Product Data
> 	Kernel driver in use: skge
> 	Kernel modules: skge
> 
> filename:
> /lib/modules/2.6.32-358.14.1.el6.x86_64.debug/kernel/drivers/net/skge.ko
> version:        1.13
> license:        GPL
> author:         Stephen Hemminger <shemminger@linux-foundation.org>
> description:    SysKonnect Gigabit Ethernet driver
> srcversion:     ADF6781C2E0D2D895F86279
> alias:          pci:v00001737d00001032sv*sd00000015bc*sc*i*
> alias:          pci:v00001737d00001064sv*sd*bc*sc*i*
> alias:          pci:v00001371d0000434Esv*sd*bc*sc*i*
> alias:          pci:v000011ABd00005005sv*sd*bc*sc*i*
> alias:          pci:v000011ABd00004320sv*sd*bc*sc*i*
> alias:          pci:v00001186d00004B01sv*sd*bc*sc*i*
> alias:          pci:v00001186d00004C00sv*sd*bc*sc*i*
> alias:          pci:v00001148d00004320sv*sd*bc*sc*i*
> alias:          pci:v00001148d00004300sv*sd*bc*sc*i*
> alias:          pci:v000010B7d000080EBsv*sd*bc*sc*i*
> alias:          pci:v000010B7d00001700sv*sd*bc*sc*i*
> depends:
> vermagic:       2.6.32-358.14.1.el6.x86_64.debug SMP mod_unload modversions
> parm:           debug:Debug level (0=none,...,16=all) (int)
> 
> 
> Given all the tests and all written, something isn't right, at all.
> Should I quote Shakespeare. :)
> 

Additionally, I have researched the history of the event and made a few
more tests.
The last kernel that worked flawlessly is from the 3.7.10 series.
I tested with the 3.7.10-400.fc19.x86_64.debug kernel.
The first kernel afterwards - the 3.8 series - introduced problems with
DMA-API, "… device driver failed to check map error".
An example that follows shows the skge module brokenness in its current
state.
The only thing that is produced is a timeout.
The same result was achieved with the 3.11.0-0.rc6.git1.1.fc20.i686 kernel.

[CLIENT]

$ lspci -knn -d 1186:4c00
01:08.0 Ethernet controller [0200]: D-Link System Inc Gigabit Ethernet
Adapter [1186:4c00] (rev 11)
	Subsystem: D-Link System Inc DGE-530T Gigabit Ethernet Adapter [1186:4c00]
	Kernel driver in use: skge

$ modinfo skge
filename:
/lib/modules/3.11.0-0.rc6.git1.1.fc20.x86_64/kernel/drivers/net/ethernet/marvell/skge.ko
version:        1.14
license:        GPL
author:         Stephen Hemminger <shemminger@linux-foundation.org>
description:    SysKonnect Gigabit Ethernet driver
srcversion:     BF56B39CFC55B011E27DAB9
alias:          pci:v00001737d00001032sv*sd00000015bc*sc*i*
alias:          pci:v00001737d00001064sv*sd*bc*sc*i*
alias:          pci:v00001371d0000434Esv*sd*bc*sc*i*
alias:          pci:v000011ABd00005005sv*sd*bc*sc*i*
alias:          pci:v000011ABd00004320sv*sd*bc*sc*i*
alias:          pci:v00001186d00004302sv*sd*bc*sc*i*
alias:          pci:v00001186d00004C00sv*sd*bc*sc*i*
alias:          pci:v00001186d00004B01sv*sd*bc*sc*i*
alias:          pci:v00001148d00004320sv*sd*bc*sc*i*
alias:          pci:v00001148d00004300sv*sd*bc*sc*i*
alias:          pci:v000010B7d000080EBsv*sd*bc*sc*i*
alias:          pci:v000010B7d00001700sv*sd*bc*sc*i*
depends:
intree:         Y
vermagic:       3.11.0-0.rc6.git1.1.fc20.x86_64 SMP mod_unload
signer:         Fedora kernel signing key
sig_key:        B1:4E:0F:25:52:6B:EE:0B:8B:66:BA:D6:38:99:D2:21:5D:37:E1:C1
sig_hashalgo:   sha256
parm:           debug:Debug level (0=none,...,16=all) (int)

$ time ssh -vvv <SERVER_IP>
OpenSSH_6.2p2, OpenSSL 1.0.1e-fips 11 Feb 2013
debug1: Reading configuration data $HOME/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 51: Applying options for *
debug2: ssh_connect: needpriv 0
debug1: Connecting to <SERVER_IP> [<SERVER_IP>] port 22.
debug1: Connection established.
debug1: identity file $HOME/.ssh/id_rsa type -1
debug1: identity file $HOME/.ssh/id_rsa-cert type -1
debug3: Incorrect RSA1 identifier
debug3: Could not load "$HOME/.ssh/id_dsa" as a RSA1 public key
debug1: identity file $HOME/.ssh/id_dsa type 2
debug1: identity file $HOME/.ssh/id_dsa-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_6.2
debug1: Remote protocol version 2.0, remote software version OpenSSH_6.2
debug1: match: OpenSSH_6.2 pat OpenSSH*
debug2: fd 3 setting O_NONBLOCK
debug3: load_hostkeys: loading entries for host "<SERVER_IP>" from file
"$HOME/.ssh/known_hosts"
debug3: load_hostkeys: found key type RSA in file $HOME/.ssh/known_hosts:1
debug3: load_hostkeys: loaded 1 keys
debug3: order_hostkeyalgs: prefer hostkeyalgs:
ssh-rsa-cert-v01@openssh.com,ssh-rsa-cert-v00@openssh.com,ssh-rsa
debug1: SSH2_MSG_KEXINIT sent
Connection to <SERVER_IP> timed out while waiting to read

real	1m0.133s
user	0m0.006s
sys	0m0.036s

# tcptrack -i enp1s8 port 22
Client                Server                State        Idle A Speed

 <CLIENT_IP>:53602     <SERVER_IP>:22        ESTABLISHED  1m    0 B/s

[\CLIENT]
.
.
[SERVER]

/var/log/secure
<DATE> <SERVER> sshd[25248]: Connection closed by <CLIENT_IP> [preauth]

[\SERVER]


Signor Greg you are supposed to be very resourceful guy, especially in
matters concerning the hardware, so please if you can set aside your
valuable time and help us finally resolve this issue.


poma


A complete thread:
http://www.spinics.net/lists/netdev/msg245381.html

  reply	other threads:[~2013-08-21 16:04 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-05  0:22 [PATCH net] skge: add dma_mapping check Stephen Hemminger
2013-08-05  1:35 ` David Miller
2013-08-05  3:40   ` [PATCH net] skge: fix build on 32 bit Stephen Hemminger
2013-08-05  6:37     ` David Miller
2013-08-10 11:51   ` [PATCH net] skge: add dma_mapping check poma
2013-08-10 17:41     ` Stephen Hemminger
2013-08-10 20:29       ` David Miller
2013-08-10 22:02         ` [PATCH net] skge: dma_sync the whole receive buffer Stephen Hemminger
2013-08-11  4:23           ` poma
2013-08-13 22:09           ` David Miller
2013-08-13 22:20             ` Stephen Hemminger
2013-08-14  1:00             ` Stephen Hemminger
2013-08-14 10:20               ` poma
2013-08-14 16:20                 ` Stephen Hemminger
2013-08-14 18:29                   ` poma
2013-08-15 15:41                     ` Stephen Hemminger
2013-08-16 14:36                       ` poma
2013-08-19  0:49                       ` poma
2013-08-20  3:28                         ` poma
2013-08-21 16:04                           ` poma [this message]
2013-08-22  0:40                             ` Greg KH
2013-08-22  3:30                               ` poma
2013-08-22  4:00                                 ` Greg KH
2013-08-22 14:46                                   ` poma
2013-08-22  4:08                                 ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5214E4FB.7010208@gmail.com \
    --to=pomidorabelisima@gmail.com \
    --cc=davem@davemloft.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=netdev@vger.kernel.org \
    --cc=stephen@networkplumber.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).