Ticket #2079 (closed defect: fixed)

Opened 10 months ago

Last modified 10 months ago

Crash in pjsip due to race condition in account's keep alive timer

Reported by: ming Owned by: bennylp
Priority: normal Milestone: release-2.8
Component: pjsua-lib Version: trunk
Keywords: Cc:
Backport to 1.x milestone: Backported: no

Description

Scenario:

  • An active account re-registers
  • The re-registration fails (such as when the DNS resolution times out after 10 seconds)
  • At exactly the same time, the keep-alive timer fires.

The issue was consistently reproducible under a very high network load, with the re-registration interval set to 5 seconds, DNS timeout default (10 seconds) and keepalive interval default (15 seconds). Since 5+10 seconds == 15 seconds, the 2 events coincide and lead to the following backtrace:

Thread #1 (Suspended : Signal : SIGSEGV:Segmentation fault)	
	keep_alive_timer_cb() at pjsua_acc.c:1,981 0x76bcb424	
	pj_timer_heap_poll() at timer.c:643 0x76e43244	
	pjsip_endpt_handle_events2() at sip_endpoint.c:713 0x76cc0dd8	
	worker_thread() at pjsua_core.c:695 0x76bde404	
Thread #2
	__pthread_mutex_unlock_usercnt() at pthread_mutex_unlock.c:66 0x4940a4d8 
	__GI___pthread_mutex_unlock() at pthread_mutex_unlock.c:315 0x4940a588 
	pj_mutex_unlock() at os_core_unix.c:1,323 0x76e4241c 
	PJSUA_UNLOCK() at pjsua_internal.h:584 0x76be4244 
	pjsua_acc_set_registration() at pjsua_acc.c:2,682 0x76be4244 
	pj::Account::getInfo() at account.cpp:737 0x76e86438 

There are two problems here:

  • timer.c: pj_timer_heap_poll() places the timer onto the freelist and releases the global lock before calling the callback -- thus the callback may operate on a timer already freed! Proposed fix: keep timer_entry out of the freelist until the callback is done.
  • pjsua_acc.c: Even with the 1st issue fixed, the account registration could still be canceled "exactly when the callback fires", because the lock is released before the callback ... thus putting NULL into the ka_transport thus causing the crash. Proposed fix: protect against NULL in ka_transport.

Thanks to Martin Oberhuber for the report and the patch.

Change History

comment:1 Changed 10 months ago by ming

  • Status changed from new to closed
  • Resolution set to fixed

In 5720:

Fixed #2079: Crash in pjsip due to race condition in account's keep alive timer

Note: See TracTickets for help on using tickets.