All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Mike Marciniszyn <mike.marciniszyn@intel.com>,
	Kaike Wan <kaike.wan@intel.com>,
	Dennis Dalessandro <dennis.dalessandro@intel.com>,
	Jason Gunthorpe <jgg@mellanox.com>
Subject: [PATCH 5.4 31/78] IB/hfi1: Adjust flow PSN with the correct resync_psn
Date: Tue, 14 Jan 2020 11:01:05 +0100	[thread overview]
Message-ID: <20200114094357.850365222@linuxfoundation.org> (raw)
In-Reply-To: <20200114094352.428808181@linuxfoundation.org>

From: Kaike Wan <kaike.wan@intel.com>

commit b2ff0d510182eb5cc05a65d1b2371af62c4b170c upstream.

When a TID RDMA ACK to RESYNC request is received, the flow PSNs for
pending TID RDMA WRITE segments will be adjusted with the next flow
generation number, based on the resync_psn value extracted from the flow
PSN of the TID RDMA ACK packet. The resync_psn value indicates the last
flow PSN for which a TID RDMA WRITE DATA packet has been received by the
responder and the requester should resend TID RDMA WRITE DATA packets,
starting from the next flow PSN.

However, if resync_psn points to the last flow PSN for a segment and the
next segment flow PSN starts with a new generation number, use of the old
resync_psn to adjust the flow PSN for the next segment will lead to
miscalculation, resulting in WARN_ON and sge rewinding errors:

  WARNING: CPU: 4 PID: 146961 at /nfs/site/home/phcvs2/gitrepo/ifs-all/components/Drivers/tmp/rpmbuild/BUILD/ifs-kernel-updates-3.10.0_957.el7.x86_64/hfi1/tid_rdma.c:4764 hfi1_rc_rcv_tid_rdma_ack+0x8f6/0xa90 [hfi1]
  Modules linked in: ib_ipoib(OE) hfi1(OE) rdmavt(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfsv3 nfs_acl nfs lockd grace fscache iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel ib_isert iscsi_target_mod target_core_mod aesni_intel lrw gf128mul glue_helper ablk_helper cryptd rpcrdma sunrpc opa_vnic ast ttm ib_iser libiscsi drm_kms_helper scsi_transport_iscsi ipmi_ssif syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev ipmi_si pcspkr sg drm_panel_orientation_quirks ipmi_devintf lpc_ich i2c_i801 ipmi_msghandler wmi rdma_ucm ib_ucm ib_uverbs acpi_cpufreq acpi_power_meter ib_umad rdma_cm ib_cm iw_cm ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul i2c_algo_bit crct10dif_common
   crc32c_intel e1000e ib_core ahci libahci ptp libata pps_core nfit libnvdimm [last unloaded: rdmavt]
  CPU: 4 PID: 146961 Comm: kworker/4:0H Kdump: loaded Tainted: G        W  OE  ------------   3.10.0-957.el7.x86_64 #1
  Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.0X.02.0117.040420182310 04/04/2018
  Workqueue: hfi0_0 _hfi1_do_tid_send [hfi1]
  Call Trace:
   <IRQ>  [<ffffffff9e361dc1>] dump_stack+0x19/0x1b
   [<ffffffff9dc97648>] __warn+0xd8/0x100
   [<ffffffff9dc9778d>] warn_slowpath_null+0x1d/0x20
   [<ffffffffc05d28c6>] hfi1_rc_rcv_tid_rdma_ack+0x8f6/0xa90 [hfi1]
   [<ffffffffc05c21cc>] hfi1_kdeth_eager_rcv+0x1dc/0x210 [hfi1]
   [<ffffffffc05c23ef>] ? hfi1_kdeth_expected_rcv+0x1ef/0x210 [hfi1]
   [<ffffffffc0574f15>] kdeth_process_eager+0x35/0x90 [hfi1]
   [<ffffffffc0575b5a>] handle_receive_interrupt_nodma_rtail+0x17a/0x2b0 [hfi1]
   [<ffffffffc056a623>] receive_context_interrupt+0x23/0x40 [hfi1]
   [<ffffffff9dd4a294>] __handle_irq_event_percpu+0x44/0x1c0
   [<ffffffff9dd4a442>] handle_irq_event_percpu+0x32/0x80
   [<ffffffff9dd4a4cc>] handle_irq_event+0x3c/0x60
   [<ffffffff9dd4d27f>] handle_edge_irq+0x7f/0x150
   [<ffffffff9dc2e554>] handle_irq+0xe4/0x1a0
   [<ffffffff9e3795dd>] do_IRQ+0x4d/0xf0
   [<ffffffff9e36b362>] common_interrupt+0x162/0x162
   <EOI>  [<ffffffff9dfa0f79>] ? swiotlb_map_page+0x49/0x150
   [<ffffffffc05c2ed1>] hfi1_verbs_send_dma+0x291/0xb70 [hfi1]
   [<ffffffffc05c2c40>] ? hfi1_wait_kmem+0xf0/0xf0 [hfi1]
   [<ffffffffc05c3f26>] hfi1_verbs_send+0x126/0x2b0 [hfi1]
   [<ffffffffc05ce683>] _hfi1_do_tid_send+0x1d3/0x320 [hfi1]
   [<ffffffff9dcb9d4f>] process_one_work+0x17f/0x440
   [<ffffffff9dcbade6>] worker_thread+0x126/0x3c0
   [<ffffffff9dcbacc0>] ? manage_workers.isra.25+0x2a0/0x2a0
   [<ffffffff9dcc1c31>] kthread+0xd1/0xe0
   [<ffffffff9dcc1b60>] ? insert_kthread_work+0x40/0x40
   [<ffffffff9e374c1d>] ret_from_fork_nospec_begin+0x7/0x21
   [<ffffffff9dcc1b60>] ? insert_kthread_work+0x40/0x40

This patch fixes the issue by adjusting the resync_psn first if the flow
generation has been advanced for a pending segment.

Fixes: 9e93e967f7b4 ("IB/hfi1: Add a function to receive TID RDMA ACK packet")
Link: https://lore.kernel.org/r/20191219231920.51069.37147.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/infiniband/hw/hfi1/tid_rdma.c |    9 +++++++++
 1 file changed, 9 insertions(+)

--- a/drivers/infiniband/hw/hfi1/tid_rdma.c
+++ b/drivers/infiniband/hw/hfi1/tid_rdma.c
@@ -4633,6 +4633,15 @@ void hfi1_rc_rcv_tid_rdma_ack(struct hfi
 			 */
 			fpsn = full_flow_psn(flow, flow->flow_state.spsn);
 			req->r_ack_psn = psn;
+			/*
+			 * If resync_psn points to the last flow PSN for a
+			 * segment and the new segment (likely from a new
+			 * request) starts with a new generation number, we
+			 * need to adjust resync_psn accordingly.
+			 */
+			if (flow->flow_state.generation !=
+			    (resync_psn >> HFI1_KDETH_BTH_SEQ_SHIFT))
+				resync_psn = mask_psn(fpsn - 1);
 			flow->resync_npkts +=
 				delta_psn(mask_psn(resync_psn + 1), fpsn);
 			/*



  parent reply	other threads:[~2020-01-14 10:04 UTC|newest]

Thread overview: 113+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-14 10:00 [PATCH 5.4 00/78] 5.4.12-stable review Greg Kroah-Hartman
2020-01-14 10:00 ` [PATCH 5.4 01/78] chardev: Avoid potential use-after-free in chrdev_open() Greg Kroah-Hartman
2020-01-14 10:00 ` [PATCH 5.4 02/78] i2c: fix bus recovery stop mode timing Greg Kroah-Hartman
2020-01-14 10:00 ` [PATCH 5.4 03/78] powercap: intel_rapl: add NULL pointer check to rapl_mmio_cpu_online() Greg Kroah-Hartman
2020-01-14 10:00 ` [PATCH 5.4 04/78] usb: chipidea: host: Disable port power only if previously enabled Greg Kroah-Hartman
2020-01-14 10:00 ` [PATCH 5.4 05/78] ALSA: usb-audio: Apply the sample rate quirk for Bose Companion 5 Greg Kroah-Hartman
2020-01-14 10:00 ` [PATCH 5.4 06/78] ALSA: hda/realtek - Add new codec supported for ALCS1200A Greg Kroah-Hartman
2020-01-14 10:00 ` [PATCH 5.4 07/78] ALSA: hda/realtek - Set EAPD control to default for ALC222 Greg Kroah-Hartman
2020-01-14 10:00 ` [PATCH 5.4 08/78] ALSA: hda/realtek - Add quirk for the bass speaker on Lenovo Yoga X1 7th gen Greg Kroah-Hartman
2020-01-14 10:00 ` [PATCH 5.4 09/78] tpm: Revert "tpm_tis: reserve chip for duration of tpm_tis_core_init" Greg Kroah-Hartman
2020-01-14 10:00 ` [PATCH 5.4 10/78] tpm: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts" Greg Kroah-Hartman
2020-01-14 10:00 ` [PATCH 5.4 11/78] tpm: Revert "tpm_tis_core: Turn on the TPM before probing IRQs" Greg Kroah-Hartman
2020-01-14 10:00 ` [PATCH 5.4 12/78] tpm: Handle negative priv->response_len in tpm_common_read() Greg Kroah-Hartman
2020-01-14 10:00 ` [PATCH 5.4 13/78] rtc: sun6i: Add support for RTC clocks on R40 Greg Kroah-Hartman
2020-01-14 10:00 ` [PATCH 5.4 14/78] kernel/trace: Fix do not unregister tracepoints when register sched_migrate_task fail Greg Kroah-Hartman
2020-01-14 10:00 ` [PATCH 5.4 15/78] tracing: Have stack tracer compile when MCOUNT_INSN_SIZE is not defined Greg Kroah-Hartman
2020-01-14 10:00 ` [PATCH 5.4 16/78] tracing: Change offset type to s32 in preempt/irq tracepoints Greg Kroah-Hartman
2020-01-14 10:00 ` [PATCH 5.4 17/78] HID: Fix slab-out-of-bounds read in hid_field_extract Greg Kroah-Hartman
2020-02-05  7:12   ` [PATCH 5.4 17/78] HID: Fix slab-out-of-bounds read in hid_field_extract (Broken!) peter enderborg
2020-02-05  9:32     ` Greg Kroah-Hartman
2020-02-05  9:49       ` Enderborg, Peter
2020-02-05  9:54         ` Jiri Kosina
2020-02-05 11:56           ` peter enderborg
2020-02-05 15:00           ` Alan Stern
2020-02-06  7:00             ` Enderborg, Peter
2020-02-06 15:14               ` Alan Stern
2020-02-07  8:11                 ` Enderborg, Peter
2020-02-07 15:22                   ` Alan Stern
2020-02-10 12:08                     ` [PATCH] HID: Extend report buffer size Peter Enderborg
2020-02-10 12:21                       ` Greg Kroah-Hartman
2020-02-10 12:40                         ` Peter Enderborg
2020-02-10 13:43                           ` Greg Kroah-Hartman
2020-02-10 15:01                       ` Alan Stern
2020-02-11  8:35                         ` peter enderborg
2020-02-11 14:54                           ` Alan Stern
2020-02-11 15:01                             ` Jiri Kosina
2020-01-14 10:00 ` [PATCH 5.4 18/78] HID: uhid: Fix returning EPOLLOUT from uhid_char_poll Greg Kroah-Hartman
2020-01-14 10:00 ` [PATCH 5.4 19/78] HID: hidraw: Fix returning EPOLLOUT from hidraw_poll Greg Kroah-Hartman
2020-01-14 10:00 ` [PATCH 5.4 20/78] HID: hid-input: clear unmapped usages Greg Kroah-Hartman
2020-01-14 10:00 ` [PATCH 5.4 21/78] Input: add safety guards to input_set_keycode() Greg Kroah-Hartman
2020-01-14 10:00 ` [PATCH 5.4 22/78] Input: input_event - fix struct padding on sparc64 Greg Kroah-Hartman
2020-01-14 10:00 ` [PATCH 5.4 23/78] drm/i915: Add Wa_1408615072 and Wa_1407596294 to icl,ehl Greg Kroah-Hartman
2020-01-14 10:00 ` [PATCH 5.4 24/78] drm/amdgpu: add DRIVER_SYNCOBJ_TIMELINE to amdgpu Greg Kroah-Hartman
2020-01-14 14:31   ` Deucher, Alexander
2020-01-14 14:39     ` Greg Kroah-Hartman
2020-01-14 10:00 ` [PATCH 5.4 25/78] Revert "drm/amdgpu: Set no-retry as default." Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 26/78] drm/sun4i: tcon: Set RGB DCLK min. divider based on hardware model Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 27/78] drm/fb-helper: Round up bits_per_pixel if possible Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 28/78] drm/dp_mst: correct the shifting in DP_REMOTE_I2C_READ Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 29/78] drm/i915: Add Wa_1407352427:icl,ehl Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 30/78] drm/i915/gt: Mark up virtual engine uabi_instance Greg Kroah-Hartman
2020-01-14 10:01 ` Greg Kroah-Hartman [this message]
2020-01-14 10:01 ` [PATCH 5.4 32/78] can: kvaser_usb: fix interface sanity check Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 33/78] can: gs_usb: gs_usb_probe(): use descriptors of current altsetting Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 34/78] can: tcan4x5x: tcan4x5x_can_probe(): get the device out of standby before register access Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 35/78] can: mscan: mscan_rx_poll(): fix rx path lockup when returning from polling to irq mode Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 36/78] can: can_dropped_invalid_skb(): ensure an initialized headroom in outgoing CAN sk_buffs Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 37/78] gpiolib: acpi: Turn dmi_system_id table into a generic quirk table Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 38/78] gpiolib: acpi: Add honor_wakeup module-option + quirk mechanism Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 39/78] pstore/ram: Regularize prz label allocation lifetime Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 40/78] staging: vt6656: set usb_set_intfdata on driver fail Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 41/78] staging: vt6656: Fix non zero logical return of, usb_control_msg Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 42/78] usb: cdns3: should not use the same dev_id for shared interrupt handler Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 43/78] usb: ohci-da8xx: ensure error return on variable error is set Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 44/78] USB-PD tcpm: bad warning+size, PPS adapters Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 45/78] USB: serial: option: add ZLP support for 0x1bc7/0x9010 Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 46/78] usb: musb: fix idling for suspend after disconnect interrupt Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 47/78] usb: musb: Disable pullup at init Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 48/78] usb: musb: dma: Correct parameter passed to IRQ handler Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 49/78] staging: comedi: adv_pci1710: fix AI channels 16-31 for PCI-1713 Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 50/78] staging: vt6656: correct return of vnt_init_registers Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 51/78] staging: vt6656: limit reg output to block size Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 52/78] staging: rtl8188eu: Add device code for TP-Link TL-WN727N v5.21 Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 53/78] serdev: Dont claim unsupported ACPI serial devices Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 54/78] iommu/vt-d: Fix adding non-PCI devices to Intel IOMMU Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 55/78] tty: link tty and port before configuring it as console Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 56/78] tty: always relink the port Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 57/78] arm64: Move __ARCH_WANT_SYS_CLONE3 definition to uapi headers Greg Kroah-Hartman
2020-01-14 10:01   ` Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 58/78] arm64: Implement copy_thread_tls Greg Kroah-Hartman
2020-01-14 10:01   ` Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 59/78] arm: " Greg Kroah-Hartman
2020-01-14 10:01   ` Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 60/78] parisc: " Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 61/78] riscv: " Greg Kroah-Hartman
2020-01-14 10:01   ` Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 62/78] xtensa: " Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 63/78] clone3: ensure copy_thread_tls is implemented Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 64/78] um: Implement copy_thread_tls Greg Kroah-Hartman
2020-01-14 10:01   ` Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 65/78] staging: vt6656: remove bool from vnt_radio_power_on ret Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 66/78] mwifiex: fix possible heap overflow in mwifiex_process_country_ie() Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 67/78] mwifiex: pcie: Fix memory leak in mwifiex_pcie_alloc_cmdrsp_buf Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 68/78] rpmsg: char: release allocated memory Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 69/78] scsi: bfa: release allocated memory in case of error Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 70/78] rtl8xxxu: prevent leaking urb Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 71/78] ath10k: fix memory leak Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 72/78] HID: hiddev: fix mess in hiddev_open() Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 73/78] USB: Fix: Dont skip endpoint descriptors with maxpacket=0 Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 74/78] phy: cpcap-usb: Fix error path when no host driver is loaded Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 75/78] phy: cpcap-usb: Fix flakey host idling and enumerating of devices Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 76/78] netfilter: arp_tables: init netns pointer in xt_tgchk_param struct Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 77/78] netfilter: conntrack: dccp, sctp: handle null timeout argument Greg Kroah-Hartman
2020-01-14 10:01 ` [PATCH 5.4 78/78] netfilter: ipset: avoid null deref when IPSET_ATTR_LINENO is present Greg Kroah-Hartman
2020-01-14 15:02 ` [PATCH 5.4 00/78] 5.4.12-stable review Jon Hunter
2020-01-14 15:02   ` Jon Hunter
2020-01-14 15:18   ` Greg Kroah-Hartman
2020-01-14 18:17 ` Guenter Roeck
2020-01-14 18:53   ` Greg Kroah-Hartman
2020-01-14 20:19 ` shuah
2020-01-14 21:55   ` Greg Kroah-Hartman
2020-01-15  2:09 ` Daniel Díaz
2020-01-15  8:12   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200114094357.850365222@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=dennis.dalessandro@intel.com \
    --cc=jgg@mellanox.com \
    --cc=kaike.wan@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mike.marciniszyn@intel.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.