Skip to content

Fix pool addition race and baseline raw parse panic#134

Open
dust-life wants to merge 1 commit intochainreactors:masterfrom
dust-life:fix/pool-addition-race-and-raw-parse
Open

Fix pool addition race and baseline raw parse panic#134
dust-life wants to merge 1 commit intochainreactors:masterfrom
dust-life:fix/pool-addition-race-and-raw-parse

Conversation

@dust-life
Copy link

Summary

  • fix a race where async producers may still send to additionCh after shutdown
  • avoid closing additionCh in BrutePool.Close() and rely on existing shutdown signals
  • remove redundant raw HTTP response reparsing in baseline.NewBaseline() and read Location directly from the live response

Why

This fixes two crash classes observed in production:

  • panic: send on closed channel
  • failures around raw response reparsing such as stacks involving bufio.(*Reader).Peek

Validation

  • go build ./... passes locally

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses two production crash classes by hardening pool shutdown behavior around additionCh and by eliminating a fragile raw HTTP response re-parse in baseline construction.

Changes:

  • Stop closing additionCh during BrutePool.Close(); introduce an additionClosed flag to reduce “send on closed channel” panics.
  • Add shutdown-awareness and panic recovery to BasePool.addAddition() to avoid enqueue crashes during teardown.
  • Remove pkg.ParseRawResponse usage in baseline.NewBaseline() and read redirect Location directly from the live response.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
core/pool/pool.go Adds additionClosed and updates addAddition() send behavior around shutdown/backpressure.
core/pool/brutepool.go Changes shutdown behavior to avoid closing additionCh and sets additionClosed.
core/baseline/baseline.go Removes raw response reparsing and reads Location header directly from resp.
Comments suppressed due to low confidence (1)

core/pool/pool.go:78

  • addAddition spawns a new goroutine whenever additionCh is full. Under sustained load this can create an unbounded number of blocked goroutines waiting to send, increasing memory usage and potentially preventing clean shutdown. Prefer a blocking send with select { case additionCh <- u: ...; case <-ctx.Done(): ... } in the current goroutine (or a bounded enqueue worker), rather than spawning one goroutine per backpressured send.
	default:
		// 强行屏蔽报错, 防止goroutine泄露
		go func() {
			defer func() {
				if recover() != nil {
					pool.wg.Done()
				}
			}()
			select {
			case pool.additionCh <- u:
			case <-pool.ctx.Done():
				pool.wg.Done()
			}
		}()

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines 676 to 680
func (pool *BrutePool) Close() {
for pool.analyzeDone {
// 等待缓存的待处理任务完成
time.Sleep(time.Duration(100) * time.Millisecond)
}
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wait loop condition in Close() looks inverted: for pool.analyzeDone { ... } will not wait when analyzeDone is false (the initial state), so shutdown proceeds immediately even though the async handler may still be running. This should likely be for !pool.analyzeDone { ... }, and processCh (or an equivalent signal) needs to be closed/canceled so Handler() can actually reach analyzeDone = true.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

尝试解决这个问题, 并且编写测试验证

Comment on lines +681 to +683
pool.additionClosed.Store(true)
// additionCh may still have async producers (redirect/crawl/retry/append);
// rely on closeCh/ctx to stop the consumer loop instead of closing the channel.
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

additionClosed is only set in Close() after the main loop breaks, but there is still a race window where async producers (e.g., redirect/crawl goroutines) can call addAddition after the consumer loop has stopped. Since additionCh is no longer closed and ctx is not canceled on normal completion, those sends can block forever (or enqueue work that will never be drained). Consider canceling pool.ctx in Close() and/or setting additionClosed before closing closeCh (or removing the extra goroutine wrappers around addAddition so the wg.Add happens synchronously).

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如何解决这个潜在的问题呢? 我们之前通过wg去判断是否还有go挂起的没有结束的

@chainreactorbot
Copy link
Contributor

@copilot open a new pull request to apply changes based on the comments in this thread

并且这个改动会破坏功能, 是否有办法不破坏功能的情况下修复这个bug?
最核心的报错再 bufio.(*Reader).Peek(0xc01454e420, 0x1) | $GOROOT/src/bufio/bufio.go:144 +0x5d | github.c

bl.Response, err = pkg.ParseRawResponse(bl.Raw)
if err != nil {
	bl.IsValid = false
	bl.Reason = pkg.ErrResponseError.Error()
	bl.ErrString = err.Error()
	return bl
}
if r := bl.Response.Header.Get("Location"); r != "" {
if r := resp.GetHeader("Location"); r != "" {
	bl.RedirectURL = r
} else {
	bl.RedirectURL = bl.Response.Header.Get("location")
	bl.RedirectURL = resp.GetHeader("location")
} 

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

core/pool/pool.go:76

  • addAddition increments pool.wg and may spawn a goroutine that blocks on pool.additionCh <- u until pool.ctx.Done(). In the normal completion path, BrutePool.Run() closes closeCh after wg.Wait() but does not cancel pool.ctx, so these send goroutines (and their corresponding wg increments) can block/leak after the consumer loop exits. This can also trigger sync: WaitGroup misuse panics if a late async producer calls wg.Add(1) while the shutdown goroutine is in wg.Wait() and the counter hits 0. Consider also selecting on pool.closeCh (treat it like a done signal) and/or canceling pool.ctx/setting additionClosed before any wg.Wait() begins so no new wg.Add() can occur during shutdown.
	if pool.ctx.Err() != nil || pool.additionClosed.Load() {
		return
	}

	pool.wg.Add(1)
	defer func() {
		if recover() != nil {
			pool.wg.Done()
		}
	}()

	select {
	case <-pool.ctx.Done():
		pool.wg.Done()
		return
	case pool.additionCh <- u:
		return
	default:
		// 强行屏蔽报错, 防止goroutine泄露
		go func() {
			defer func() {
				if recover() != nil {
					pool.wg.Done()
				}
			}()
			select {
			case pool.additionCh <- u:
			case <-pool.ctx.Done():
				pool.wg.Done()

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants