Changes between Initial Version and Version 1 of PJNATH_Memory_Usage


Ignore:
Timestamp:
Jun 3, 2010 4:07:08 PM (14 years ago)
Author:
bennylp
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • PJNATH_Memory_Usage

    v1 v1  
     1{{{ 
     2#!html 
     3<!-- MAIN TABLE START --> 
     4<table border=0 width="90%" align="center"><tr><td> 
     5}}} 
     6 
     7= PJNATH RAM Usage Analysis and Optimization = 
     8 
     9'''Table of Contents''' 
     10[[PageOutline(2-3,,inline)]] 
     11 
     12[[BR]] 
     13 
     14This page explains how to analyze the heap usage requirement of PJNATH and how to set various settings to reduce its consumption. 
     15 
     16[[BR]] 
     17 
     18== Scope == 
     19 
     20We will use [http://www.pjsip.org/pjsua.htm pjsua] application for this purpose. All NAT traversal features will be enabled: STUN, ICE, and TURN. The lot. 
     21 
     22Memory usage of non-PJNATH components will not be shown nor calculated. 
     23 
     24The tests are run on Linux x86_64 machine. The default 64bit alignment may grow the memory usage slightly. The executable are built with optimization turned off (because I needed to do some debugging at the same time), but this shouldn't matter to the heap usage I think. 
     25 
     26I'm using PJSIP version 1.6-trunk (r3189 to be exact). 
     27 
     28[[BR]] 
     29 
     30== How to Calculate the Memory Usage == 
     31 
     32You can enter "'''dd'''" in pjsua do dump memory usage to the screen and log. For this test though, I've instrumented the application to dump memory usage at certain points (with {{{pjsip_endpt_dump()}}}) to make the results more consistent. 
     33 
     34[[BR]] 
     35 
     36== Call Setup == 
     37 
     38Tests are all done in caller side. 
     39 
     40Caller: 
     41 
     42{{{ 
     43 $ ./pjsua-x86_64-unknown-linux-gnu --null-audio --max-calls=1 \ 
     44                                    --stun-srv=stun.pjsip.org --use-ice \ 
     45                                    --dis-codec \* --add-codec pcmu  
     46                                    --log-file mem.log \ 
     47                                    --use-turn --turn-srv=turn.pjsip.org:33478 \ 
     48                                    --turn-user xxx --turn-passwd xxx 
     49}}} 
     50 - has two components (RTP and RTCP) and three candidates for each component (host, STUN, and TURN) in the SDP offer 
     51 
     52 
     53Callee: 
     54 
     55{{{ 
     56 $ /pjsua-x86_64-unknown-linux-gnu --null-audio --local-port 5080 \ 
     57                                   --max-calls 1 --use-ice \ 
     58                                   --stun-srv=stun.pjsip.org 
     59}}} 
     60 - has two components (RTP and RTCP) and one host candidate in the SDP answer. 
     61 
     62 
     63[[BR]] 
     64 
     65== Checkpoints == 
     66 
     67Memory usage calculations are taken at the following checkpoints: 
     68 
     69 1. Startup, after TURN allocation:  
     70    - this will show the ICE objects, along with some transmit data buffers for TURN allocation 
     71 1. After pjsua_start():  
     72    - this will show additional objects by NAT type checker. This would be a good indication on how much memory is used during startup. 
     73 1. Idle after initialization:  
     74    - memory usage should decrease before NAT type checker and initial transmit data buffers have been cleared. This shows the idle memory usage. 
     75 1. Right after making outgoing call:  
     76    - this will show the additional objects created for the session 
     77 1. ICE negotiation is complete:  
     78    - this would '''probably''' show the peak memory usage, as many ICE connectivity checks are still kept in memory. 
     79    - Warning though: that might not be true. If connectivity checks have been running for a long time (say more than 7 seconds), some objects may have been cleaned. 
     80 1. 1 minute into call:  
     81    - memory usage should decrease as ICE connectivity checks are done. 
     82 
     83[[BR]] 
     84 
     85== Running with Default Settings == 
     86 
     87Here are the result: 
     88 
     89 
     90|| ||   Used    ||Allocated     ||Utilization %|| 
     91||1) Startup, after TURN allocation     ||41,968        ||58,672        ||72|| 
     92||2) After pjsua_start()        ||46,728        ||66,792        ||70|| 
     93||3) Idle after initialization  ||33,744        ||46,528        ||73|| 
     94||4) Right after making outgoing call   ||43,936        ||61,280        ||72|| 
     95||5) ICE negotiation is complete        ||55,008        ||75,960        ||72|| 
     96||6) 1 minute into call ||44,568        ||61,792        ||72|| 
     97 
     98I suppose we can improve that! 
     99 
     100We'll discuss how we can optimize that below. Read on. 
     101 
     102[[BR]] 
     103 
     104== Optimizing the Memory Usage == 
     105 
     106These methods below only discuss the optimization for PJNATH. For more general memory usage optimization methods, please see [wiki:FAQ#Footprint footprint discussion in the FAQ]. 
     107 
     108These are the methods that can be used to reduce memory usage. 
     109 
     110=== Reduce the size of the packet buffers === 
     111 
     112Sample optimized value for the affected settings (and their previous default values in comment): 
     113{{{ 
     114#define PJ_STUN_SOCK_PKT_LEN                    (160+200)               /* 2000 */ 
     115#define PJ_STUN_MAX_PKT_LEN                     PJ_STUN_SOCK_PKT_LEN    /*  800 */ 
     116#define PJ_TURN_MAX_PKT_LEN                     PJ_STUN_MAX_PKT_LEN     /* 3000 */ 
     117}}} 
     118 
     119Each STUN and TURN socket/session would allocate that much memory for their buffers, so the savings from reducing these would be significant. 
     120 
     121Note:  
     122 - (160+200): 160 is for 20ms PCMA/PCMU frame, and 200 is for additional STUN/TURN headers in case the frame needs to be transported encapsulated with that (it's probably too big, but I haven't checked the actual size). 
     123 
     124Warning: 
     125 - reducing the buffer size will limit how much you can send/receive of course 
     126 
     127 
     128=== Limit the number of ICE candidates, checks, components, etc. === 
     129 
     130Sample optimized value for the affected settings (and their default value): 
     131{{{ 
     132 #define PJ_ICE_ST_MAX_CAND             4                       /* 8 */ 
     133 #define PJ_ICE_COMP_BITS               0                       /* 1 */ 
     134 #define PJ_ICE_MAX_CAND                (PJ_ICE_ST_MAX_CAND*2)  /* 16 */ 
     135 #define PJ_ICE_MAX_CHECKS              (PJ_ICE_ST_MAX_CAND*PJ_ICE_ST_MAX_CAND) /* 32 */ 
     136}}} 
     137 
     138 
     139Warning: 
     140 - reducing these constants may cause inability to run on particular hosts (e.g. when there are too many interfaces in the host) 
     141 - .. or to talk to certain peers (when they have too many candidates in their SDPs). 
     142 
     143=== Reduce Log Verbosity === 
     144 
     145Suggested setting: 
     146{{{ 
     147 #define PJ_LOG_MAX_LEVEL               4   /* 5 */ 
     148}}} 
     149 
     150Turning off level 5 logging will turn off message tracing in the STUN session, which frees up memory by 1000 bytes per STUN session! 
     151 
     152=== Optimize the Pool Size === 
     153 
     154Using smaller pool sizes would reduce the memory wasted, at the expense of more calls to ''malloc()''. 
     155 
     156For a very lazy optimization, just set all pool sizes to 128 (or lower!). 
     157 
     158Warning: 
     159 - your app would run slower if you set the pool sizes to smaller values 
     160 
     161 
     162=== All Settings === 
     163 
     164These are the combined settings based on above methods. Copy and paste these to your '''{{{config_site.h}}}''': 
     165 
     166{{{ 
     167/* To reduce socket buffers */ 
     168#define PJ_STUN_SOCK_PKT_LEN                    (160+200)               /* 2000 */ 
     169#define PJ_STUN_MAX_PKT_LEN                     PJ_STUN_SOCK_PKT_LEN    /*  800 */ 
     170#define PJ_TURN_MAX_PKT_LEN                     PJ_STUN_MAX_PKT_LEN     /* 3000 */ 
     171 
     172/* Reduce the size of the respective sessions */ 
     173#define PJ_ICE_ST_MAX_CAND                      4                       /* 8 */ 
     174#define PJ_ICE_COMP_BITS                        0                       /* 1 */ 
     175#define PJ_ICE_MAX_CAND                         (PJ_ICE_ST_MAX_CAND*2)  /* 16 */ 
     176#define PJ_ICE_MAX_CHECKS                       (PJ_ICE_ST_MAX_CAND*PJ_ICE_ST_MAX_CAND) /* 32 */ 
     177 
     178/* Log level < 5 frees up 1000 bytes of buffer in the STUN session! */ 
     179#define PJ_LOG_MAX_LEVEL                        4                       /* 5 */ 
     180 
     181/* A lazy pool memory usage optimization.. */ 
     182#   define PJNATH_POOL_LEN_ICE_SESS                 128 
     183#   define PJNATH_POOL_INC_ICE_SESS                 128 
     184#   define PJNATH_POOL_LEN_ICE_STRANS               128 
     185#   define PJNATH_POOL_INC_ICE_STRANS               128 
     186#   define PJNATH_POOL_LEN_NATCK                    128 
     187#   define PJNATH_POOL_INC_NATCK                    128 
     188#   define PJNATH_POOL_LEN_STUN_SESS                128 
     189#   define PJNATH_POOL_INC_STUN_SESS                128 
     190#   define PJNATH_POOL_LEN_STUN_TDATA               128 
     191#   define PJNATH_POOL_INC_STUN_TDATA               128 
     192 
     193#   define PJNATH_POOL_LEN_TURN_SESS                128 
     194#   define PJNATH_POOL_INC_TURN_SESS                128 
     195#   define PJNATH_POOL_LEN_TURN_SOCK                128 
     196#   define PJNATH_POOL_INC_TURN_SOCK                128 
     197 
     198 
     199}}} 
     200 
     201 
     202 
     203[[BR]] 
     204 
     205== Optimized Results == 
     206 
     207The result, after using the {{{config_site.h}}} settings above: 
     208 
     209|| ||   Used    ||Allocated     ||Utilization %|| 
     210||1) App initialization, after TURN allocation  ||21,488        ||25,216        ||85|| 
     211||2) After pjsua_start()        ||24,440        ||28,800        ||85|| 
     212||3) Idle after initialization  ||15,568        ||18,048        ||86|| 
     213||4) Right after making outgoing call   ||21,032        ||24,064        ||87|| 
     214||5) After ICE negotiation is complete  ||25,368        ||29,312        ||87|| 
     215||6) 1 minute into call ||21,464        ||24,320        ||88|| 
     216 
     217[[BR]] 
     218 
     219== Conclusion == 
     220 
     221After the optimization, peak memory usage is around 29KB per call, compared to 75KB previously. I think that's not bad! But please continue reading the warnings below. ;-) 
     222 
     223 
     224[[BR]] 
     225 
     226== Warnings == 
     227 
     228 - please see all other warnings above 
     229 - the number of candidates will vary on each host, hence the memory usages will vary. 
     230 
     231 
     232[[BR]] 
     233 
     234= Appendix = 
     235 
     236[[BR]] 
     237 
     238== Crash Course on ICE == 
     239 
     240Let me explain briefly how ICE in PJNATH works, in order to understand where the memory is used. It will be good if you also read the [http://www.pjsip.org/pjnath/docs/html/index.htm PJNATH manual]. For each object mentioned below, I will also give the memory pool name format to recognize them in the memory dump output later, in square brackets. For example, "STUN session {{{[stuntp%p]}}}" means the STUN session is using memory pool which name is formatted with printf like "stuntp%p" format, e.g. "stuntp0x12345678". So here it goes. 
     241 
     242 
     243[[BR]] 
     244 
     245=== Objects Created During Startup === 
     246 
     247==== ICE Media Transports ==== 
     248 
     249These objects are created during PJSUA-LIB initialization, and will be kept alive throughout. 
     250 
     251If ICE is enabled, each call will require one ''PJMEDIA ICE media transport'' {{{[icetp%p]}}}, which in turn creates one ''ICE stream transport'' {{{[icetp%p]}}}. Each of these will have two ICE components by default (i.e. RTP and RTCP components). For each component, one ''STUN socket transport'' {{{[stuntp%p]}}} and one ''TURN socket transport'' {{{[udprel%p]}}} will be created. 
     252 
     253The ''STUN socket transport'' in turn will create one ''STUN session''. which each will create another pool for incoming packet buffer. All of these use {{{[stuntp%p]}}} pool name format. 
     254 
     255Each ''TURN socket transport'' creates one ''TURN session'', which in turn create one one ''STUN session'', along with its incoming packet buffer. All of these use {{{[udprel%p]}}} pool name format. 
     256 
     257Sample dump output: 
     258 
     259{{{ 
     260.. 
     261 19:04:16.888       cachpool              icetp00:      344 of      512 (67%) used 
     262 19:04:16.888       cachpool              icetp00:     1848 of     1920 (96%) used 
     263 19:04:16.888       cachpool      stuntp0x8218e88:     1248 of     1792 (69%) used 
     264 19:04:16.889       cachpool      stuntp0x8218e88:      784 of      896 (87%) used 
     265 19:04:16.889       cachpool      stuntp0x8218e88:      416 of      512 (81%) used 
     266 19:04:16.889       cachpool      udprel0x822de60:     1184 of     1408 (84%) used 
     267 19:04:16.889       cachpool      udprel0x822de60:     1816 of     1920 (94%) used 
     268 19:04:16.889       cachpool      udprel0x822de60:      872 of      896 (97%) used 
     269 19:04:16.889       cachpool      udprel0x822de60:      368 of      384 (95%) used 
     270 19:04:16.889       cachpool      stuntp0x822f800:     1248 of     1792 (69%) used 
     271 19:04:16.889       cachpool      stuntp0x822f800:      784 of      896 (87%) used 
     272 19:04:16.889       cachpool      stuntp0x822f800:      416 of      512 (81%) used 
     273 19:04:16.889       cachpool      udprel0x82308e8:     1184 of     1408 (84%) used 
     274 19:04:16.889       cachpool      udprel0x82308e8:     1816 of     1920 (94%) used 
     275 19:04:16.889       cachpool      udprel0x82308e8:      872 of      896 (97%) used 
     276 19:04:16.889       cachpool      udprel0x82308e8:      368 of      384 (95%) used 
     277.. 
     278}}} 
     279 
     280Note that all the above objects are the memory dump of just a single ICE media transport! 
     281 
     282 
     283==== NAT Type Checker ==== 
     284 
     285The library will also perform NAT type detection to assist NAT related troubleshooting. This test will run briefly (approximately ten seconds), and will be cleaned after that. The NAT type detector's pool format is {{{[natck%p]}}}. 
     286 
     287 
     288==== Transmit Data Buffers ==== 
     289 
     290Each outgoing STUN packet allocates one {{{[tdata%p]}}} pool. Normally these buffers will be kept for few seconds due to retransmissions. 
     291 
     292 ''Note: the SIP transmit buffer is named {{{[tdta%p]}}}. Did you notice the difference?'' 
     293 
     294Sample dump of objects related to NAT type checker: 
     295 
     296{{{ 
     297.. 
     298 19:04:05.499       cachpool       natck0x8233c80:     1200 of     1280 (93%) used 
     299 19:04:05.499       cachpool       natck0x8233c80:      784 of      896 (87%) used 
     300 19:04:05.499       cachpool       natck0x8233c80:      416 of      512 (81%) used 
     301 19:04:05.499       cachpool       tdata0x8234ad0:      888 of     1152 (77%) used 
     302 19:04:05.499       cachpool       tdata0x8234f70:      888 of     1152 (77%) used 
     303 19:04:05.499       cachpool       tdata0x822da48:      888 of     1152 (77%) used 
     304.. 
     305}}} 
     306 
     307 
     308Did you remember which object is which? 
     309 
     310 
     311[[BR]] 
     312 
     313=== Objects Created During Call === 
     314 
     315For each call, an ''ICE session'' {{{[tdta%p]}}} will be created. Then several individual ''STUN sessions'' {{{[stuse%p]}}} will be created, one for each route. Recall that ICE works by ''pairing'' every ''local candidates'' with each ''remote candidates'', creating N x M possible routes. A mechanism is defined in ICE spec to optimize the number of possible routes, but still, each will need to be checked and each check will require sending a request and waiting for response. 
     316 
     317Sample memory dump with three local candidates and two remote candidates: 
     318 
     319{{{ 
     320.. 
     321 19:04:33.681       cachpool              icetp00:     3928 of     3968 (98%) used 
     322 19:04:33.681       cachpool       stuse0x8231bd8:      784 of      896 (87%) used 
     323 19:04:33.681       cachpool       stuse0x82303c0:      376 of      384 (97%) used 
     324 19:04:33.687       cachpool       stuse0x82304c8:      784 of      896 (87%) used 
     325 19:04:33.687       cachpool       stuse0x82326b8:      104 of      256 (40%) used 
     326 19:04:33.687       cachpool       tdata0x822da48:      912 of     1024 (89%) used 
     327 19:04:33.687       cachpool       tdata0x823f658:     1144 of     1280 (89%) used 
     328 19:04:33.687       cachpool       tdata0x823fc08:     1144 of     1280 (89%) used 
     329 19:04:33.687       cachpool       tdata0x82401b8:     1080 of     1280 (84%) used 
     330.. 
     331}}} 
     332 
     333 
     334 
     335{{{ 
     336#!html 
     337<!-- MAIN TABLE END --> 
     338</td></tr></table> 
     339}}} 
     340 
     341