Opened 10 years ago

Closed 10 years ago

Last modified 10 years ago

#1033 closed defect (fixed)

Assertion error when shutting down PJSIP while TCP/TLS connect is in progress and a transaction is waiting (thanks Tamàs Solymosi for the report)

Reported by: bennylp Owned by: bennylp
Priority: normal Milestone: release-1.6
Component: pjsip Version: trunk
Keywords: Cc:
Backport to 1.x milestone: Backported:

Description

Assertion in sip_transaction.c when the transaction layer is being unloaded and a transaction is still pending/waiting for transport operation to complete, for example when TCP/TLS transport is connecting to the destination.

Typical stack trace:

17 pj_thread_local_get()  ..\pjlib\src\pj\os_core_symbian.cpp:699 0x789f0dd4
16 lock_tsx()             ..\pjsip\src\pjsip\sip_transaction.c:902 0x78973914
15 send_msg_callback()    ..\pjsip\src\pjsip\sip_transaction.c:1642 0x7897543c
14 stateless_send_transport_cb() ..\pjsip\src\pjsip\sip_util.c:1074 0x78987c40
13 transport_send_callback() ..\pjsip\src\pjsip\sip_transport.c:610 0x78978e60
12 on_data_sent()         ..\pjsip\src\pjsip\sip_transport_tls.c:1009 0x78981204
11 tls_destroy()          ..\pjsip\src\pjsip\sip_transport_tls.c:662 0x7898067c
10 tls_destroy_transport() ..\pjsip\src\pjsip\sip_transport_tls.c:621 0x78980578
9 destroy_transport()     ..\pjsip\src\pjsip\sip_transport.c:941 0x78979878
8 pjsip_tpmgr_destroy()   ..\pjsip\src\pjsip\sip_transport.c:1258 0x7897a250
7 pjsip_endpt_destroy()   ..\pjsip\src\pjsip\sip_endpoint.c:595 0x789643bc
6 pjsua_destroy()         ..\pjsip\src\pjsua-lib\pjsua_core.c:1596 0x7892c4f8

Change History (7)

comment:1 Changed 10 years ago by bennylp

Additional info:

This happens because when the transport is being destroyed, it will cancel all pending send operations on this transport instance. This cancellation then reach the transaction instance, which will try to cancel and unregister itself from the transaction layer module. But unfortunately, at this point the transaction layer module had been destroyed beforehand because it has higher priority number than the transport layer module, causing the assertion.

A different but similar stack trace on Linux:

Thread [1] (Suspended: Signal 'SIGABRT' received. Description: Aborted.)	
	18 raise()  0x00007ffff692c4b5	
	17 abort()  0x00007ffff692ff50	
	16 __assert_fail()  0x00007ffff6925481	
	15 pj_mutex_unlock() ..pj/pjlib/src/pj/os_core_unix.c:1251 0x000000000053c3ec	
	14 mod_tsx_layer_unregister_tsx() ..pj/pjsip/src/pjsip/sip_transaction.c:605 0x000000000046b3fb	
	13 tsx_set_state() ..pj/pjsip/src/pjsip/sip_transaction.c:1137 0x000000000046c1c9	
	12 send_msg_callback() ..pj/pjsip/src/pjsip/sip_transaction.c:1754 0x000000000046d79b	
	11 stateless_send_transport_cb() ..pj/pjsip/src/pjsip/sip_util.c:1091 0x00000000004593fd	
	10 transport_send_callback() ..pj/pjsip/src/pjsip/sip_transport.c:597 0x000000000045cbce	
	9 on_data_sent() ..pj/pjsip/src/pjsip/sip_transport_tcp.c:1001 0x00000000004626d7	
	8 tcp_destroy() ..pj/pjsip/src/pjsip/sip_transport_tcp.c:691 0x0000000000461c87	
	7 tcp_destroy_transport() ..pj/pjsip/src/pjsip/sip_transport_tcp.c:649 0x0000000000461b67	
	6 destroy_transport() ..pj/pjsip/src/pjsip/sip_transport.c:922 0x000000000045d545	
	5 pjsip_tpmgr_destroy() ..pj/pjsip/src/pjsip/sip_transport.c:1238 0x000000000045ddde	
	4 pjsip_endpt_destroy() ..pj/pjsip/src/pjsip/sip_endpoint.c:592 0x0000000000455fa6	
	3 pjsua_destroy() ..pj/pjsip/src/pjsua-lib/pjsua_core.c:1362 0x0000000000421d17	
	2 app_destroy() ..pj/pjsip-apps/src/pjsua/pjsua_app.c:4794 0x0000000000411dee	
	1 main() ..pj/pjsip-apps/src/pjsua/main.c:88 0x00000000004073ae	

Steps to reproduce this:

  • modify sip_transport_tcp.c:865:
     //status = pj_activesock_start_connect(tcp->asock, tcp->base.pool, rem_addr,
     //				 sizeof(pj_sockaddr_in));
     status = PJ_EPENDING;
    
  • send OPTIONS message to a TCP destination
  • quit pjsua

comment:2 Changed 10 years ago by bennylp

Initial fix in r3071:

  • added cancellation mechanism, where transaction can ignore the send message completion callback if it has been destroyed

comment:3 Changed 10 years ago by bennylp

  • Resolution set to fixed
  • Status changed from new to closed

comment:4 Changed 10 years ago by bennylp

In r3084:

  • fix for r3071: added protection for case when TSX_HAS_PENDING_TRANSPORT flag is set to the transaction but pending_tx is NULL, causing crash

comment:5 Changed 10 years ago by bennylp

  • Resolution fixed deleted
  • Status changed from closed to reopened

Reopened:

It's been reported that r3071 caused transaction to refuse to send ACK

comment:6 Changed 10 years ago by bennylp

  • Resolution set to fixed
  • Status changed from reopened to closed

In r3090:

  • Fixed the problem above that caused ACK not to be sent. This happened when TCP switching is used, and the TCP fails to send the request.

comment:7 Changed 10 years ago by nanang

In r3114:

  • Fixed send_msg_callback of transaction.c to reset 'cont' flag to stop (re)transmitting when transaction has been unregistered.
Note: See TracTickets for help on using tickets.