wiki:helgrind

Version 1 (modified by ming, 9 years ago) (diff)

--

This wiki describes the issue from ticket #1820 (Helgrind thread error detector test and analysis)

Helgrind issue #1

Description: Data race over possible concurrent read/write access for thread's quit flag.
One thread tries to write the quit flag while another tries to read it.
Status: false positive, this is intentional behaviour.

==26641== ---Thread-Announcement------------------------------------------
==26641==
==26641== Thread #6 was created
==26641==    at 0x5953FCE: clone (clone.S:74)
==26641==    by 0x5352199: do_clone.constprop.3 (createthread.c:75)
==26641==    by 0x53538BA: pthread_create@@GLIBC_2.2.5 (createthread.c:245)
==26641==    by 0x4C30C90: pthread_create_WRK (hg_intercepts.c:269)
==26641==    by 0x56704B: pj_thread_create (os_core_unix.c:616)
==26641==    by 0x49CC57: pjmedia_endpt_create (endpoint.c:169)
==26641==    by 0x43102A: pjsua_media_subsys_init (pjsua_media.c:80)
==26641==    by 0x42B576: pjsua_init (pjsua_core.c:1058)
==26641==    by 0x40800A: app_init (pjsua_app.c:1346)
==26641==    by 0x408F1A: pjsua_app_init (pjsua_app.c:1881)
==26641==    by 0x405AA9: main_func (main.c:108)
==26641==    by 0x568914: pj_run_app (os_core_unix.c:1930)
==26641==
==26641== Possible data race during write of size 4 at 0x8182E8 by thread #1
==26641== Locks held: none
==26641==    at 0x42AA5E: pjsua_stop_worker_threads (pjsua_core.c:730)
==26641==    by 0x42C753: pjsua_destroy2 (pjsua_core.c:1548)
==26641==    by 0x42CDC9: pjsua_destroy (pjsua_core.c:1775)
==26641==    by 0x409291: app_destroy (pjsua_app.c:2011)
==26641==    by 0x409304: pjsua_app_destroy (pjsua_app.c:2035)
==26641==    by 0x405ADF: main_func (main.c:116)
==26641==    by 0x568914: pj_run_app (os_core_unix.c:1930)
==26641==    by 0x405B32: main (main.c:129)
==26641==
==26641== This conflicts with a previous read of size 4 by thread #7
==26641== Locks held: none
==26641==    at 0x42A978: worker_thread (pjsua_core.c:691)
==26641==    by 0x566E30: thread_main (os_core_unix.c:523)
==26641==    by 0x4C30E26: mythread_wrapper (hg_intercepts.c:233)
==26641==    by 0x5353181: start_thread (pthread_create.c:312)
==26641==    by 0x595400C: clone (clone.S:111)

Helgrind issue #2

Lock order inversion involving rwmutex_lock_read()
Both threads try to acquire read locks, which should be ok. However, on some platforms, read write mutex may be implemented as some regular mutex, and deadlock can occur.

==26641== Thread #7: lock order "0x5EDC798 before 0x5E25638" violated
(lock read should be ok)
==26641==
==26641== Observed (incorrect) order is: acquisition of lock at 0x5E25638
==26641==    at 0x4C2FE45: pthread_rwlock_rdlock_WRK (hg_intercepts.c:1549)
==26641==    by 0x567DA5: pj_rwmutex_lock_read (os_core_unix.c:1448)
==26641==    by 0x46D6A5: pjsip_endpt_process_rx_data (sip_endpoint.c:853)
==26641==    by 0x46DAD0: endpt_on_rx_msg (sip_endpoint.c:1036)
==26641==    by 0x476652: pjsip_tpmgr_receive_packet (sip_transport.c:1789)
==26641==    by 0x4774DC: udp_on_read_complete (sip_transport_udp.c:173)
==26641==    by 0x563CFC: ioqueue_dispatch_read_event (ioqueue_common_abs.c:591)
==26641==    by 0x565E20: pj_ioqueue_poll (ioqueue_select.c:963)
==26641==    by 0x46D47C: pjsip_endpt_handle_events2 (sip_endpoint.c:741)
==26641==    by 0x42CE6D: pjsua_handle_events (pjsua_core.c:1833)
==26641==    by 0x42A964: worker_thread (pjsua_core.c:694)
==26641==    by 0x566E30: thread_main (os_core_unix.c:523)
==26641==
==26641==  followed by a later acquisition of lock at 0x5EDC798
==26641==    at 0x4C32145: pthread_mutex_lock (hg_intercepts.c:518)
==26641==    by 0x5679DB: pj_mutex_lock (os_core_unix.c:1243)
==26641==    by 0x56ED6F: pj_lock_acquire (lock.c:180)
==26641==    by 0x56EFED: grp_lock_acquire (lock.c:290)
==26641==    by 0x56F471: pj_grp_lock_acquire (lock.c:437)
==26641==    by 0x483370: pjsip_tsx_recv_msg (sip_transaction.c:1764)
==26641==    by 0x481802: mod_tsx_layer_on_rx_response (sip_transaction.c:872)
==26641==    by 0x46D7E6: pjsip_endpt_process_rx_data (sip_endpoint.c:894)
==26641==    by 0x46DAD0: endpt_on_rx_msg (sip_endpoint.c:1036)
==26641==    by 0x476652: pjsip_tpmgr_receive_packet (sip_transport.c:1789)
==26641==    by 0x4774DC: udp_on_read_complete (sip_transport_udp.c:173)
==26641==    by 0x563CFC: ioqueue_dispatch_read_event (ioqueue_common_abs.c:591)
==26641==
==26641== Required order was established by acquisition of lock at 0x5EDC798
==26641==    at 0x4C32145: pthread_mutex_lock (hg_intercepts.c:518)
==26641==    by 0x5679DB: pj_mutex_lock (os_core_unix.c:1243)
==26641==    by 0x56ED6F: pj_lock_acquire (lock.c:180)
==26641==    by 0x56EFED: grp_lock_acquire (lock.c:290)
==26641==    by 0x56F471: pj_grp_lock_acquire (lock.c:437)
==26641==    by 0x48326E: pjsip_tsx_send_msg (sip_transaction.c:1721)
==26641==    by 0x486252: pjsip_endpt_send_request (sip_util_statefull.c:117)
==26641==    by 0x44E1FB: pjsip_regc_send (sip_reg.c:1410)
==26641==    by 0x41DA10: pjsua_acc_set_registration (pjsua_acc.c:2527)
==26641==    by 0x418489: pjsua_acc_add (pjsua_acc.c:485)
==26641==    by 0x408CC7: app_init (pjsua_app.c:1809)
==26641==    by 0x408F1A: pjsua_app_init (pjsua_app.c:1881)
==26641==
==26641==  followed by a later acquisition of lock at 0x5E25638
==26641==    at 0x4C2FE45: pthread_rwlock_rdlock_WRK (hg_intercepts.c:1549)
==26641==    by 0x567DA5: pj_rwmutex_lock_read (os_core_unix.c:1448)
==26641==    by 0x46DB92: endpt_on_tx_msg (sip_endpoint.c:1066)
==26641==    by 0x474740: pjsip_transport_send (sip_transport.c:802)
==26641==    by 0x470CCF: stateless_send_transport_cb (sip_util.c:1243)
==26641==    by 0x471017: stateless_send_resolver_callback (sip_util.c:1344)
==26641==    by 0x4727F6: pjsip_resolve (sip_resolve.c:306)
==26641==    by 0x46DDB8: pjsip_endpt_resolve (sip_endpoint.c:1148)
==26641==    by 0x471132: pjsip_endpt_send_request_stateless (sip_util.c:1388)
==26641==    by 0x4840E7: tsx_send_msg (sip_transaction.c:2119)
==26641==    by 0x484875: tsx_on_state_null (sip_transaction.c:2349)
==26641==    by 0x4832A3: pjsip_tsx_send_msg (sip_transaction.c:1727)

Helgrind issue #3

Pulse Audio related
Status: outside PJSIP's scope
TODO: create a suppression file for this

==26641== Possible data race during read of size 4 at 0x6CEBC28 by thread #1
==26641== Locks held: none
==26641==    at 0x6AB5FE1: pa_once_begin (in
/usr/lib/x86_64-linux-gnu/pulseaudio/libpulsecommon-4.0.so)
==26641==    by 0x6AB616A: pa_run_once (in
/usr/lib/x86_64-linux-gnu/pulseaudio/libpulsecommon-4.0.so)
==26641==    by 0x6ACAE43: pa_thread_self (in
/usr/lib/x86_64-linux-gnu/pulseaudio/libpulsecommon-4.0.so)
==26641==    by 0x666146B: pa_threaded_mainloop_lock (in
/usr/lib/x86_64-linux-gnu/libpulse.so.0.16.2)
==26641==    by 0x8F573AF: pulse_connect (in
/usr/lib/x86_64-linux-gnu/alsa-lib/libasound_module_pcm_pulse.so)
==26641==    by 0x8F56C9B: _snd_pcm_pulse_open (in
/usr/lib/x86_64-linux-gnu/alsa-lib/libasound_module_pcm_pulse.so)
==26641==    by 0x55B36FC: ??? (in /usr/lib/x86_64-linux-gnu/libasound.so.2.0.0)
==26641==    by 0x55B3CF5: ??? (in /usr/lib/x86_64-linux-gnu/libasound.so.2.0.0)
==26641==    by 0x559982: OpenPcm (pa_linux_alsa.c:544)
==26641==    by 0x559A34: FillInDevInfo (pa_linux_alsa.c:580)
==26641==    by 0x55A9B8: BuildDeviceList (pa_linux_alsa.c:853)
==26641==    by 0x558E8B: PaAlsa_Initialize (pa_linux_alsa.c:266)
==26641==
==26641== This conflicts with a previous write of size 4 by thread #2
==26641== Locks held: 1, at address 0x5E41130
==26641==    at 0x6AB6093: pa_once_end (in
/usr/lib/x86_64-linux-gnu/pulseaudio/libpulsecommon-4.0.so)
==26641==    by 0x6ACAEEC: ??? (in
/usr/lib/x86_64-linux-gnu/pulseaudio/libpulsecommon-4.0.so)
==26641==    by 0x4C30E26: mythread_wrapper (hg_intercepts.c:233)
==26641==    by 0x5353181: start_thread (pthread_create.c:312)
==26641==    by 0x595400C: clone (clone.S:111)

Helgrind issue #4

Data race over access to g_last_thread variable.
Status: won't fix. The variable is only used for information purpose only, while mutex locking for logging will be quite expensive.

==26641== Lock at 0x5E24F98 was first observed
==26641==    at 0x4C31DDA: pthread_mutex_init (hg_intercepts.c:443)
==26641==    by 0x567730: init_mutex (os_core_unix.c:1139)
==26641==    by 0x567909: pj_mutex_create (os_core_unix.c:1192)
==26641==    by 0x56798B: pj_mutex_create_recursive (os_core_unix.c:1220)
==26641==    by 0x42AD68: pjsua_create (pjsua_core.c:823)
==26641==    by 0x407E0C: app_init (pjsua_app.c:1288)
==26641==    by 0x408F1A: pjsua_app_init (pjsua_app.c:1881)
==26641==    by 0x405AA9: main_func (main.c:108)
==26641==    by 0x568914: pj_run_app (os_core_unix.c:1930)
==26641==    by 0x405B32: main (main.c:129)
==26641==
==26641== Lock at 0x5EDC798 was first observed
==26641==    at 0x4C31DDA: pthread_mutex_init (hg_intercepts.c:443)
==26641==    by 0x567730: init_mutex (os_core_unix.c:1139)
==26641==    by 0x567909: pj_mutex_create (os_core_unix.c:1192)
==26641==    by 0x56EB58: create_mutex_lock (lock.c:75)
==26641==    by 0x56EBE5: pj_lock_create_recursive_mutex (lock.c:96)
==26641==    by 0x56F3D5: pj_grp_lock_create (lock.c:413)
==26641==    by 0x481C13: tsx_create (sip_transaction.c:1011)
==26641==    by 0x48258F: pjsip_tsx_create_uac2 (sip_transaction.c:1306)
==26641==    by 0x482427: pjsip_tsx_create_uac (sip_transaction.c:1270)
==26641==    by 0x4861C2: pjsip_endpt_send_request (sip_util_statefull.c:103)
==26641==    by 0x44E1FB: pjsip_regc_send (sip_reg.c:1410)
==26641==    by 0x41DA10: pjsua_acc_set_registration (pjsua_acc.c:2527)
==26641==
==26641== Possible data race during write of size 8 at 0x8100E0 by thread #7
==26641== Locks held: 2, at addresses 0x5E24F98 0x5EDC798
==26641==    at 0x570214: pj_log (log.c:421)
==26641==    by 0x570675: pj_log_3 (log.c:515)
==26641==    by 0x41BB3B: acc_check_nat_addr (pjsua_acc.c:1674)
==26641==    by 0x41C996: regc_tsx_cb (pjsua_acc.c:2093)
==26641==    by 0x44D7BF: regc_tsx_callback (sip_reg.c:1092)
==26641==    by 0x486100: mod_util_on_tsx_state (sip_util_statefull.c:81)
==26641==    by 0x4822A9: tsx_set_state (sip_transaction.c:1210)
==26641==    by 0x4858B5: tsx_on_state_proceeding_uac (sip_transaction.c:3008)
==26641==    by 0x484C09: tsx_on_state_calling (sip_transaction.c:2492)
==26641==    by 0x48338B: pjsip_tsx_recv_msg (sip_transaction.c:1765)
==26641==    by 0x481802: mod_tsx_layer_on_rx_response (sip_transaction.c:872)
==26641==    by 0x46D7E6: pjsip_endpt_process_rx_data (sip_endpoint.c:894)
==26641==
==26641== This conflicts with a previous read of size 8 by thread #1
==26641== Locks held: none
==26641==    at 0x5701E8: pj_log (log.c:419)
==26641==    by 0x56CF92: invoke_log (errno.c:223)
==26641==    by 0x56D062: pj_perror_imp (errno.c:244)
==26641==    by 0x56D127: pj_perror (errno.c:252)
==26641==    by 0x4059EE: on_app_started (main.c:32)
==26641==    by 0x409040: pjsua_app_run (pjsua_app.c:1915)
==26641==    by 0x405ABC: main_func (main.c:110)
==26641==    by 0x568914: pj_run_app (os_core_unix.c:1930)

Helgrind issue #5

Data race over cp->used_size variable in pool_caching.
Status: won't fix, the variable is for statistics purpose only.

==26641== Lock at 0x817F90 was first observed
==26641==    at 0x4C31DDA: pthread_mutex_init (hg_intercepts.c:443)
==26641==    by 0x567730: init_mutex (os_core_unix.c:1139)
==26641==    by 0x567909: pj_mutex_create (os_core_unix.c:1192)
==26641==    by 0x56EB58: create_mutex_lock (lock.c:75)
==26641==    by 0x56EBB3: pj_lock_create_simple_mutex (lock.c:89)
==26641==    by 0x571AAE: pj_caching_pool_init (pool_caching.c:81)
==26641==    by 0x42ACF9: pjsua_create (pjsua_core.c:815)
==26641==    by 0x407E0C: app_init (pjsua_app.c:1288)
==26641==    by 0x408F1A: pjsua_app_init (pjsua_app.c:1881)
==26641==    by 0x405AA9: main_func (main.c:108)
==26641==    by 0x568914: pj_run_app (os_core_unix.c:1930)
==26641==    by 0x405B32: main (main.c:129)
==26641==
==26641== Possible data race during write of size 8 at 0x817DA0 by thread #1
==26641== Locks held: 1, at address 0x817F90
==26641==    at 0x57224E: cpool_on_block_free (pool_caching.c:332)
==26641==    by 0x577841: default_block_free (pool_policy_malloc.c:67)
==26641==    by 0x571625: reset_pool (pool.c:254)
==26641==    by 0x5716C6: pj_pool_destroy_int (pool.c:292)
==26641==    by 0x571EDA: cpool_release_pool (pool_caching.c:238)
==26641==    by 0x571137: pj_pool_release (pool_i.h:92)
==26641==    by 0x409281: app_destroy (pjsua_app.c:2007)
==26641==    by 0x409304: pjsua_app_destroy (pjsua_app.c:2035)
==26641==    by 0x405ADF: main_func (main.c:116)
==26641==    by 0x568914: pj_run_app (os_core_unix.c:1930)
==26641==    by 0x405B32: main (main.c:129)
==26641==
==26641== This conflicts with a previous write of size 8 by thread #7
==26641== Locks held: none
==26641==    at 0x57224E: cpool_on_block_free (pool_caching.c:332)
==26641==    by 0x577841: default_block_free (pool_policy_malloc.c:67)
==26641==    by 0x571625: reset_pool (pool.c:254)
==26641==    by 0x5716A7: pj_pool_reset (pool.c:276)
==26641==    by 0x477597: udp_on_read_complete (sip_transport_udp.c:221)

Helgrind issue #6

Data race over tcp->pending_connect variable
Status: should be safe. A modification to suppress this would be desirable.

==5950== Lock at 0x6CAE988 was first observed
==5950==    at 0x4C2DDF1: pthread_mutex_init (in
/usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5950==    by 0x50AEE1: init_mutex (os_core_unix.c:1157)
==5950==    by 0x50B789: pj_mutex_create (os_core_unix.c:1211)
==5950==    by 0x427C06: pjsua_create (pjsua_core.c:817)
==5950==    by 0x408553: app_init (pjsua_app.c:1292)
==5950==    by 0x40A780: pjsua_app_init (pjsua_app.c:1885)
==5950==    by 0x407A0B: main_func (main.c:108)
==5950==    by 0x64F476C: (below main) (libc-start.c:226)
==5950==
==5950== Lock at 0x6D42E58 was first observed
==5950==    at 0x4C2DDF1: pthread_mutex_init (in
/usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5950==    by 0x50AEE1: init_mutex (os_core_unix.c:1157)
==5950==    by 0x50B789: pj_mutex_create (os_core_unix.c:1211)
==5950==    by 0x5112BE: create_mutex_lock (lock.c:75)
==5950==    by 0x5117DB: pj_grp_lock_create (lock.c:438)
==5950==    by 0x511B40: pj_grp_lock_create_w_handler (lock.c:463)
==5950==    by 0x46D575: tsx_create (sip_transaction.c:1014)
==5950==    by 0x46E2B5: pjsip_tsx_create_uac2 (sip_transaction.c:1309)
==5950==    by 0x46F144: pjsip_endpt_send_request (sip_util_statefull.c:103)
==5950==    by 0x440F73: pjsip_regc_send (sip_reg.c:1424)
==5950==    by 0x418408: pjsua_acc_set_registration (pjsua_acc.c:2579)
==5950==    by 0x41C164: pjsua_acc_add (pjsua_acc.c:487)
==5950==
==5950== Lock at 0x6D45728 was first observed
==5950==    at 0x4C2DDF1: pthread_mutex_init (in
/usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5950==    by 0x50AEE1: init_mutex (os_core_unix.c:1157)
==5950==    by 0x50B789: pj_mutex_create (os_core_unix.c:1211)
==5950==    by 0x5112BE: create_mutex_lock (lock.c:75)
==5950==    by 0x5117DB: pj_grp_lock_create (lock.c:438)
==5950==    by 0x463DE7: tcp_create.constprop.3 (sip_transport_tcp.c:683)
==5950==    by 0x46441E: lis_create_transport (sip_transport_tcp.c:1021)
==5950==    by 0x4606A4: pjsip_tpmgr_acquire_transport2 (sip_transport.c:2040)
==5950==    by 0x4165C1: pjsua_acc_get_uac_addr.part.6 (pjsua_acc.c:3151)
==5950==    by 0x417FB7: pjsua_acc_create_uac_contact (pjsua_acc.c:3233)
==5950==    by 0x418670: pjsua_acc_set_registration (pjsua_acc.c:2334)
==5950==    by 0x41C164: pjsua_acc_add (pjsua_acc.c:487)
==5950==
==5950== Possible data race during read of size 4 at 0x6D43E20 by thread #1
==5950== Locks held: 2, at addresses 0x6CAE988 0x6D42E58
==5950==    at 0x463652: tcp_send_msg (sip_transport_tcp.c:1246)
==5950==    by 0x45F93A: pjsip_transport_send (sip_transport.c:833)
==5950==    by 0x45B0FD: stateless_send_transport_cb (sip_util.c:1243)
==5950==    by 0x45B439: stateless_send_resolver_callback (sip_util.c:1344)
==5950==    by 0x45E09E: pjsip_resolve (sip_resolve.c:306)
==5950==    by 0x45D16F: pjsip_endpt_send_request_stateless (sip_util.c:1388)
==5950==    by 0x46C087: tsx_send_msg (sip_transaction.c:2129)
==5950==    by 0x46C223: tsx_on_state_null (sip_transaction.c:2360)
==5950==    by 0x46ED35: pjsip_tsx_send_msg (sip_transaction.c:1737)
==5950==    by 0x46F1AE: pjsip_endpt_send_request (sip_util_statefull.c:117)
==5950==    by 0x440F73: pjsip_regc_send (sip_reg.c:1424)
==5950==    by 0x418408: pjsua_acc_set_registration (pjsua_acc.c:2579)
==5950==
==5950== This conflicts with a previous write of size 4 by thread #7
==5950== Locks held: 1, at address 0x6D45728
==5950==    at 0x46380A: on_connect_complete (sip_transport_tcp.c:1423)
==5950==    by 0x509C05: ioqueue_dispatch_write_event (ioqueue_common_abs.c:280)
==5950==    by 0x50A82E: pj_ioqueue_poll (ioqueue_select.c:969)
==5950==    by 0x459D8A: pjsip_endpt_handle_events2 (sip_endpoint.c:741)
==5950==    by 0x426946: pjsua_handle_events (pjsua_core.c:1833)
==5950==    by 0x426EB9: worker_thread (pjsua_core.c:696)
==5950==    by 0x50BD89: thread_main (os_core_unix.c:541)
==5950==    by 0x4C2DC3D: ??? (in
/usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5950==
==5950== Address 0x6D43E20 is 416 bytes inside a block of size 6144 alloc'd
==5950==    at 0x4C2B87D: malloc (in
/usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5950==    by 0x51BEAE: default_block_alloc (pool_policy_malloc.c:46)
==5950==    by 0x5130B5: pj_pool_allocate_find (pool.c:60)
==5950==    by 0x5131EC: pj_pool_calloc (pool_i.h:69)
==5950==    by 0x463BC8: tcp_create.constprop.3 (pool.h:476)
==5950==    by 0x46441E: lis_create_transport (sip_transport_tcp.c:1021)
==5950==    by 0x4606A4: pjsip_tpmgr_acquire_transport2 (sip_transport.c:2040)
==5950==    by 0x4165C1: pjsua_acc_get_uac_addr.part.6 (pjsua_acc.c:3151)
==5950==    by 0x417FB7: pjsua_acc_create_uac_contact (pjsua_acc.c:3233)
==5950==    by 0x418670: pjsua_acc_set_registration (pjsua_acc.c:2334)
==5950==    by 0x41C164: pjsua_acc_add (pjsua_acc.c:487)
==5950==    by 0x408DCF: app_init (pjsua_app.c:1813)