From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Kazior Date: Fri, 14 Jun 2013 13:46:32 +0200 Subject: [ath9k-devel] [PATCH 5/6] ath10k: wait for CE to drain when shutting down In-Reply-To: <87ip1hopd9.fsf@kamboji.qca.qualcomm.com> References: <1371040066-17631-1-git-send-email-michal.kazior@tieto.com> <1371040066-17631-6-git-send-email-michal.kazior@tieto.com> <87ip1hopd9.fsf@kamboji.qca.qualcomm.com> Message-ID: <51BB0298.2070807@tieto.com> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: ath9k-devel@lists.ath9k.org On 13/06/13 20:08, Kalle Valo wrote: > Michal Kazior writes: > >> ath10k_pci_process_ce() is used to process >> completions. Only one thread can do that though. >> >> If one thread starts handling completions then the >> other one (i.e. possibly PCI shutdown) would exit >> immediatelely and free up memory while completions >> are being processed leading to corruption. >> >> Signed-off-by: Michal Kazior > > [...] > >> +/* This function assumes no new data is going to be submitted/completed. It is >> + * mainly intended to flush out completions when stopping the device. */ >> +static void ath10k_pci_wait_for_ce_drain(struct ath10k *ar) >> +{ >> + struct ath10k_pci *ar_pci = ath10k_pci_priv(ar); >> + int ret; >> + >> + ret = wait_event_timeout(ar_pci->compl_wq, ({ >> + bool processing; >> + spin_lock_bh(&ar_pci->compl_lock); >> + processing = ar_pci->compl_processing; >> + spin_unlock_bh(&ar_pci->compl_lock); >> + (!processing); >> + }), 5*HZ); >> + if (ret == 0) >> + ath10k_warn("timed out while waiting for completions to be processed\n"); >> +} > > This looks like a hack to me. Wouldn't it be a better to fix make sure > that all threads/tasklets are stopped, for example with tasklet_kill() > and cancel_work_sync()? (And of course first making sure that we don't > fire new instances). That way we could be sure that there is no other > thread running while we shutdown. Apparently I can't reproduce this bug anymore. I can't really locate where the other thread could be comming from. Maybe an interrupt? We don't unregister interrupt handlers at the point where the issue happens (we only stop CE interrupts via registers). Perhaps we can drop this patch for now and look for a proper solution later on? -- Pozdrawiam / Best regards, Michal Kazior.