Fix pool addition race and baseline raw parse panic#134
Fix pool addition race and baseline raw parse panic#134dust-life wants to merge 1 commit intochainreactors:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses two production crash classes by hardening pool shutdown behavior around additionCh and by eliminating a fragile raw HTTP response re-parse in baseline construction.
Changes:
- Stop closing
additionChduringBrutePool.Close(); introduce anadditionClosedflag to reduce “send on closed channel” panics. - Add shutdown-awareness and panic recovery to
BasePool.addAddition()to avoid enqueue crashes during teardown. - Remove
pkg.ParseRawResponseusage inbaseline.NewBaseline()and read redirectLocationdirectly from the live response.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| core/pool/pool.go | Adds additionClosed and updates addAddition() send behavior around shutdown/backpressure. |
| core/pool/brutepool.go | Changes shutdown behavior to avoid closing additionCh and sets additionClosed. |
| core/baseline/baseline.go | Removes raw response reparsing and reads Location header directly from resp. |
Comments suppressed due to low confidence (1)
core/pool/pool.go:78
addAdditionspawns a new goroutine wheneveradditionChis full. Under sustained load this can create an unbounded number of blocked goroutines waiting to send, increasing memory usage and potentially preventing clean shutdown. Prefer a blocking send withselect { case additionCh <- u: ...; case <-ctx.Done(): ... }in the current goroutine (or a bounded enqueue worker), rather than spawning one goroutine per backpressured send.
default:
// 强行屏蔽报错, 防止goroutine泄露
go func() {
defer func() {
if recover() != nil {
pool.wg.Done()
}
}()
select {
case pool.additionCh <- u:
case <-pool.ctx.Done():
pool.wg.Done()
}
}()
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| func (pool *BrutePool) Close() { | ||
| for pool.analyzeDone { | ||
| // 等待缓存的待处理任务完成 | ||
| time.Sleep(time.Duration(100) * time.Millisecond) | ||
| } |
There was a problem hiding this comment.
The wait loop condition in Close() looks inverted: for pool.analyzeDone { ... } will not wait when analyzeDone is false (the initial state), so shutdown proceeds immediately even though the async handler may still be running. This should likely be for !pool.analyzeDone { ... }, and processCh (or an equivalent signal) needs to be closed/canceled so Handler() can actually reach analyzeDone = true.
| pool.additionClosed.Store(true) | ||
| // additionCh may still have async producers (redirect/crawl/retry/append); | ||
| // rely on closeCh/ctx to stop the consumer loop instead of closing the channel. |
There was a problem hiding this comment.
additionClosed is only set in Close() after the main loop breaks, but there is still a race window where async producers (e.g., redirect/crawl goroutines) can call addAddition after the consumer loop has stopped. Since additionCh is no longer closed and ctx is not canceled on normal completion, those sends can block forever (or enqueue work that will never be drained). Consider canceling pool.ctx in Close() and/or setting additionClosed before closing closeCh (or removing the extra goroutine wrappers around addAddition so the wg.Add happens synchronously).
There was a problem hiding this comment.
如何解决这个潜在的问题呢? 我们之前通过wg去判断是否还有go挂起的没有结束的
|
@copilot open a new pull request to apply changes based on the comments in this thread 并且这个改动会破坏功能, 是否有办法不破坏功能的情况下修复这个bug? |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
core/pool/pool.go:76
addAdditionincrementspool.wgand may spawn a goroutine that blocks onpool.additionCh <- uuntilpool.ctx.Done(). In the normal completion path,BrutePool.Run()closescloseChafterwg.Wait()but does not cancelpool.ctx, so these send goroutines (and their correspondingwgincrements) can block/leak after the consumer loop exits. This can also triggersync: WaitGroup misusepanics if a late async producer callswg.Add(1)while the shutdown goroutine is inwg.Wait()and the counter hits 0. Consider also selecting onpool.closeCh(treat it like a done signal) and/or cancelingpool.ctx/settingadditionClosedbefore anywg.Wait()begins so no newwg.Add()can occur during shutdown.
if pool.ctx.Err() != nil || pool.additionClosed.Load() {
return
}
pool.wg.Add(1)
defer func() {
if recover() != nil {
pool.wg.Done()
}
}()
select {
case <-pool.ctx.Done():
pool.wg.Done()
return
case pool.additionCh <- u:
return
default:
// 强行屏蔽报错, 防止goroutine泄露
go func() {
defer func() {
if recover() != nil {
pool.wg.Done()
}
}()
select {
case pool.additionCh <- u:
case <-pool.ctx.Done():
pool.wg.Done()
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Summary
additionChafter shutdownadditionChinBrutePool.Close()and rely on existing shutdown signalsbaseline.NewBaseline()and readLocationdirectly from the live responseWhy
This fixes two crash classes observed in production:
panic: send on closed channelbufio.(*Reader).PeekValidation
go build ./...passes locally