= PJMEDIA MIPS Measurement = This page attempts to show the typical performance characteristic of various PJMEDIA components, which could be useful to evaluate PJMEDIA performance. Please do not interpret these numbers as an official or definite performance number, as there are many compilation flags in PJMEDIA as well as compiler switches that can be set to increase (or decrease) the performance. == Test Method == Each test should measure the overall performance for both directions. So for example for resampling, the test shows the total upsample and downsample time in a single test, and for codec it will show the total encoding and decoding time. The test program depends on correct settings of CPU_MHZ and MIPS value of the processor being set correctly during compilation time. We used the MIPS information in the following links to assume the MIPS value of the processor: - http://en.wikipedia.org/wiki/Million_instructions_per_second - http://en.wikipedia.org/wiki/ARM_architecture To measure the MIPS score of a component, the program calculates the time to process 1 second worth of audio samples using that component, then calculates the MIPS score based on the configured MIPS value of the processor. Because of this, the calculated MIPS shouldn't be interpreted as a ''real'' MIPS value since it's purely based on time measurement, and our assumed MIPS value for the processor may be wrong, and in platforms where floating-point is available, floating-point instructions will be used instead. The test uses strictly one thread only. All the results below are done with the default settings that come with PJSIP distribution. The test source code are available in '''pjmedia/src/test''' directory ({{{mips_test.c}}} file). == Interpreting the Results == === Columns === There are four columns in the result table: '''Clock Rate:''' :: This shows the sampling rate of the component being tested, in KHz. We test both 8KHz and 16KHz. Most components can work in both 8KHz and 16KHz hence there will be two test result rows for the same component, each with different clock rate. Some components (mostly related codec) can only work in one of the clock rate (e.g. GSM is only shown in 8KHz, while G.722 is only shown in 16KHz) hence there will only be one test result row for these components. '''Time (usec):''' :: This shows the time elapsed to process 1 second worth of audio samples, in microseconds. '''CPU (%):''' :: This shows how much CPU usage (in percent) this component will consume when running it in real-time. The value is derived from the time measurement above. For example, if the time elapsed is 1 secondthen this component will take 100% of CPU time when run in real-time. Or if the time elapsed is 0.5 second then this component will take 50% of the CPU time when run in real-time. The CPU percentage maybe larger than 100% if the time taken to process 1 second worth of audio samples is more than 1 second. It may happen when we perform the test on slower processor. '''MIPS:''' :: The MIPS (Million Instructions per Second) score roughly means how many instructions will be executed by this component per second when we run this component in real-time. The value is derived from the time measurement above, and calculated based on the assumed MIPS value of the processor. Once again, the score may be incorrect for many reasons so it shouldn't be interpreted as an official/definite score, and especially one MUST NOT use the MIPS score to compare performance of different processor families/architectures. === Rows === The rows show the measurement result of a particular components. The components tested are described below. '''get from memplayer:''' :: The memory/buffer based player port supplies the audio samples for almost all of the tests, so its time adds as overhead for all tests. '''conference bridge with N call(s):''' :: This measures the performance of the conference bridge with N calls. Note that we don't use actual call for the test since we only want to measure the conference bridge performance and not codec performance (this will be measured in separate tests). So for this test we use memplayer for each "call" to supply audio to the bridge. During the test all the calls (ports) will be connected to port zero and port zero will be connected to all calls. No connection among calls are created. '''upsample+downsample:''' :: This measures the performance of the resampling algorithm used. The test gets the audio from the memplayer, upsample it twice the clock rate, then downsample it half the clock rate again so that the clock rate now is the same as originally. This test measures both linear and non-linear resampling using small filter and large filter. Some resampling backend algorithms may not support selecting between linear/non-linear and small/large filter, in that case the results will be equal for all settings. '''WSOLA PLC - N% loss:''' :: This measures the performance of Waveform Similarity based Overlap and Add (WSOLA) algorithm when it is used to generate/emulate lost packet (a.k.a Packet Lost Concealment/PLC). Timing for various loss percentages are shown. The WSOLA algorithm is used by both the delay buffer and PLC algorithm in pjmedia. The delay buffer itself is used by the splitcomb, sound port, and the conference bridge to adapt to audio burst and clock drifts. '''WSOLA discard N% excess:''' :: This measures the performance of Waveform Similarity based Overlap and Add (WSOLA) algorithm when it is used to discard excess audio samples (e.g. caused by clock drifts). Timing for various excess percentages are shown. '''echo canceller Nms tail len:''' :: This measures the performance of the acoustic echo canceller (AEC) for various echo tail settings. The audio source is taken from memplayer, and there is no acoustic delay in the AEC input. '''tone generator with single/dual freq:''' :: This measures the performance of the tone generator to continuously generate single or dual frequency tone for 1 second. '''codec encode/decode:''' :: This measures the time to encode and then decode 1 second worth of audio samples using the specified codec for 1 second. '''stream TX/RX:''' :: This test is intended to measure the performance/overhead of the stream, which consist of codec, RTP/RTCP processing, and de-jitter buffering. In addition it also tests the performance of Secure RTP (SRTP) for various setting combinations and codec bandwidth. Since the test here also consists of codec processing (encoding and decoding), you need to subtract the result with the result of the corresponding codec to measure the overhead of the stream and SRTP only. == Results == === PJSIP-0.9.0, Linux, ARM9 (ARM926EJ-S), gcc === ||Hardware:||Olimex SAM9-L9260 board|| ||Platform:||Linux 2.6.23|| ||Processor:||ARM926EJ-S rev 5 (v5l)|| ||Speed:||180 MHz|| ||Assumed MIPS:||198 MIPS|| ||BogoMIPS:||98.91|| ||Compilation:||arm-926-linux-gnu-gcc -O2 -msoft-float -DNDEBUG -DPJ_HAS_FLOATING_POINT=0|| ||gcc:|| version 4.2.1 --with-cpu=arm926ej-s -march=armv5te -msoft-float --with-float=soft || Result: {{{ 00:59:38.531 os_core_unix.c pjlib 0.9.0-trunk for POSIX initialized MIPS test, with CPU=180Mhz, 198.0 MIPS Clock Item Time CPU MIPS Rate (usec) (%) ---------------------------------------------------------------------- 8KHz get from memplayer 181 0.018 0.04 8KHz conference bridge with 1 call 6682 0.668 1.32 8KHz conference bridge with 2 calls 11943 1.194 2.36 8KHz conference bridge with 4 calls 22402 2.240 4.44 8KHz conference bridge with 8 calls 42969 4.297 8.51 8KHz conference bridge with 16 calls 83328 8.333 16.50 8KHz upsample+downsample - linear 5815 0.581 1.15 8KHz upsample+downsample - small filter 66786 6.679 13.22 8KHz upsample+downsample - large filter 870754 87.075 172.41 8KHz WSOLA PLC - 0% loss 605 0.060 0.12 8KHz WSOLA PLC - 2% loss 1004 0.100 0.20 8KHz WSOLA PLC - 5% loss 1541 0.154 0.31 8KHz WSOLA PLC - 10% loss 1803 0.180 0.36 8KHz WSOLA PLC - 20% loss 3102 0.310 0.61 8KHz WSOLA PLC - 50% loss 8431 0.843 1.67 8KHz WSOLA discard 2% excess 214 0.021 0.04 8KHz WSOLA discard 5% excess 488 0.049 0.10 8KHz WSOLA discard 10% excess 1178 0.118 0.23 8KHz WSOLA discard 20% excess 2009 0.201 0.40 8KHz WSOLA discard 50% excess 6432 0.643 1.27 8KHz echo canceller 100ms tail len 335870 33.587 66.50 8KHz echo canceller 128ms tail len 336225 33.623 66.57 8KHz echo canceller 200ms tail len 349240 34.924 69.15 8KHz echo canceller 256ms tail len 363206 36.321 71.91 8KHz echo canceller 400ms tail len 400026 40.003 79.21 8KHz echo canceller 500ms tail len 426646 42.665 84.48 8KHz echo canceller 512ms tail len 432291 43.229 85.59 8KHz echo canceller 600ms tail len 454965 45.496 90.08 8KHz echo canceller 800ms tail len 516487 51.649 102.26 8KHz tone generator with single freq 920 0.092 0.18 8KHz tone generator with dual freq 1428 0.143 0.28 8KHz codec encode/decode - G.711 2701 0.270 0.53 8KHz codec encode/decode - GSM 75750 7.575 15.00 8KHz codec encode/decode - iLBC 2856203 285.620 565.53 8KHz codec encode/decode - Speex 8Khz 436162 43.616 86.36 8KHz codec encode/decode - L16/8000/1 1704 0.170 0.34 8KHz stream TX/RX - G.711 6786 0.679 1.34 8KHz stream TX/RX - G.711 SRTP 32bit 21688 2.169 4.29 8KHz stream TX/RX - G.711 SRTP 32bit +auth 33501 3.350 6.63 8KHz stream TX/RX - G.711 SRTP 80bit 21725 2.172 4.30 8KHz stream TX/RX - G.711 SRTP 80bit +auth 33551 3.355 6.64 8KHz stream TX/RX - GSM 82035 8.203 16.24 8KHz stream TX/RX - GSM SRTP 32bit 90890 9.089 18.00 8KHz stream TX/RX - GSM SRTP 32bit + auth 99334 9.933 19.67 8KHz stream TX/RX - GSM SRTP 80bit 90893 9.089 18.00 8KHz stream TX/RX - GSM SRTP 80bit + auth 99356 9.936 19.67 16KHz get from memplayer 239 0.024 0.05 16KHz conference bridge with 1 call 12780 1.278 2.53 16KHz conference bridge with 2 calls 23052 2.305 4.56 16KHz conference bridge with 4 calls 43174 4.317 8.55 16KHz conference bridge with 8 calls 82096 8.210 16.26 16KHz conference bridge with 16 calls 158565 15.856 31.40 16KHz upsample+downsample - linear 11469 1.147 2.27 16KHz upsample+downsample - small filter 133088 13.309 26.35 16KHz upsample+downsample - large filter 1739742 173.974 344.47 16KHz WSOLA PLC - 0% loss 980 0.098 0.19 16KHz WSOLA PLC - 2% loss 1910 0.191 0.38 16KHz WSOLA PLC - 5% loss 3734 0.373 0.74 16KHz WSOLA PLC - 10% loss 7867 0.787 1.56 16KHz WSOLA PLC - 20% loss 13007 1.301 2.58 16KHz WSOLA PLC - 50% loss 29022 2.902 5.75 16KHz WSOLA discard 2% excess 551 0.055 0.11 16KHz WSOLA discard 5% excess 1027 0.103 0.20 16KHz WSOLA discard 10% excess 1973 0.197 0.39 16KHz WSOLA discard 20% excess 10454 1.045 2.07 16KHz WSOLA discard 50% excess 22276 2.228 4.41 16KHz echo canceller 100ms tail len 664649 66.465 131.60 16KHz echo canceller 128ms tail len 682686 68.269 135.17 16KHz echo canceller 200ms tail len 720924 72.092 142.74 16KHz echo canceller 256ms tail len 752928 75.293 149.08 16KHz echo canceller 400ms tail len 877528 87.753 173.75 16KHz echo canceller 500ms tail len 970559 97.056 192.17 16KHz echo canceller 512ms tail len 989839 98.984 195.99 16KHz echo canceller 600ms tail len 1065465 106.547 210.96 16KHz echo canceller 800ms tail len 1285075 128.508 254.44 16KHz tone generator with single freq 1617 0.162 0.32 16KHz tone generator with dual freq 2632 0.263 0.52 16KHz codec encode/decode - G.722 148080 14.808 29.32 16KHz codec encode/decode - Speex 16Khz 979202 97.920 193.88 16KHz codec encode/decode - L16/16000/1 3244 0.324 0.64 16KHz stream TX/RX - G.722 155685 15.568 30.83 }}} === PJSIP-0.9.0, Linux, Pentium3, gcc === ||Hardware:||IBM X21 Notebook|| ||Platform:||Linux 2.6.23|| ||Processor:||Pentium III|| ||Speed:||700 MHz|| ||Assumed MIPS:|| 1895.6 MIPS|| ||BogoMIPS:|| 1395.36 || ||Compilation:|| -O3 -march=pentium3 -fomit-frame-pointer -DNDEBUG || ||gcc:|| version 4.2.3|| Result: {{{ 02:01:45.561 os_core_unix.c pjlib 0.9.0-trunk for POSIX initialized MIPS test, with CPU=700Mhz, 1895.6 MIPS Clock Item Time CPU MIPS Rate (usec) (%) ---------------------------------------------------------------------- 8KHz get from memplayer 23 0.002 0.04 8KHz conference bridge with 1 call 800 0.080 1.52 8KHz conference bridge with 2 calls 1395 0.140 2.64 8KHz conference bridge with 4 calls 2522 0.252 4.78 8KHz conference bridge with 8 calls 4704 0.470 8.92 8KHz conference bridge with 16 calls 9146 0.915 17.34 8KHz upsample+downsample - linear 589 0.059 1.12 8KHz upsample+downsample - small filter 9563 0.956 18.13 8KHz upsample+downsample - large filter 46644 4.664 88.42 8KHz WSOLA PLC - 0% loss 107 0.011 0.20 8KHz WSOLA PLC - 2% loss 240 0.024 0.45 8KHz WSOLA PLC - 5% loss 466 0.047 0.88 8KHz WSOLA PLC - 10% loss 524 0.052 0.99 8KHz WSOLA PLC - 20% loss 958 0.096 1.82 8KHz WSOLA PLC - 50% loss 2667 0.267 5.06 8KHz WSOLA discard 2% excess 57 0.006 0.11 8KHz WSOLA discard 5% excess 142 0.014 0.27 8KHz WSOLA discard 10% excess 364 0.036 0.69 8KHz WSOLA discard 20% excess 631 0.063 1.20 8KHz WSOLA discard 50% excess 2081 0.208 3.94 8KHz echo canceller 100ms tail len 40050 4.005 75.92 8KHz echo canceller 128ms tail len 33179 3.318 62.89 8KHz echo canceller 200ms tail len 35161 3.516 66.65 8KHz echo canceller 256ms tail len 37470 3.747 71.03 8KHz echo canceller 400ms tail len 45104 4.510 85.50 8KHz echo canceller 500ms tail len 50504 5.050 95.74 8KHz echo canceller 512ms tail len 50940 5.094 96.56 8KHz echo canceller 600ms tail len 56113 5.611 106.37 8KHz echo canceller 800ms tail len 71677 7.168 135.87 8KHz tone generator with single freq 1758 0.176 3.33 8KHz tone generator with dual freq 3506 0.351 6.65 8KHz codec encode/decode - G.711 357 0.036 0.68 8KHz codec encode/decode - GSM 11382 1.138 21.58 8KHz codec encode/decode - iLBC 46894 4.689 88.89 8KHz codec encode/decode - Speex 8Khz 64428 6.443 122.13 8KHz codec encode/decode - L16/8000/1 248 0.025 0.47 8KHz stream TX/RX - G.711 617 0.062 1.17 8KHz stream TX/RX - G.711 SRTP 32bit 1751 0.175 3.32 8KHz stream TX/RX - G.711 SRTP 32bit +auth 3161 0.316 5.99 8KHz stream TX/RX - G.711 SRTP 80bit 1773 0.177 3.36 8KHz stream TX/RX - G.711 SRTP 80bit +auth 3108 0.311 5.89 8KHz stream TX/RX - GSM 11755 1.176 22.28 8KHz stream TX/RX - GSM SRTP 32bit 12439 1.244 23.58 8KHz stream TX/RX - GSM SRTP 32bit + auth 13285 1.329 25.18 8KHz stream TX/RX - GSM SRTP 80bit 12270 1.227 23.26 8KHz stream TX/RX - GSM SRTP 80bit + auth 13358 1.336 25.32 16KHz get from memplayer 27 0.003 0.05 16KHz conference bridge with 1 call 1522 0.152 2.89 16KHz conference bridge with 2 calls 2711 0.271 5.14 16KHz conference bridge with 4 calls 4772 0.477 9.05 16KHz conference bridge with 8 calls 8913 0.891 16.90 16KHz conference bridge with 16 calls 18759 1.876 35.56 16KHz upsample+downsample - linear 1136 0.114 2.15 16KHz upsample+downsample - small filter 19231 1.923 36.45 16KHz upsample+downsample - large filter 93066 9.307 176.42 16KHz WSOLA PLC - 0% loss 177 0.018 0.34 16KHz WSOLA PLC - 2% loss 534 0.053 1.01 16KHz WSOLA PLC - 5% loss 1165 0.116 2.21 16KHz WSOLA PLC - 10% loss 2796 0.280 5.30 16KHz WSOLA PLC - 20% loss 4515 0.451 8.56 16KHz WSOLA PLC - 50% loss 10482 1.048 19.87 16KHz WSOLA discard 2% excess 168 0.017 0.32 16KHz WSOLA discard 5% excess 326 0.033 0.62 16KHz WSOLA discard 10% excess 654 0.065 1.24 16KHz WSOLA discard 20% excess 3526 0.353 6.68 16KHz WSOLA discard 50% excess 7507 0.751 14.23 16KHz echo canceller 100ms tail len 68547 6.855 129.94 16KHz echo canceller 128ms tail len 72619 7.262 137.66 16KHz echo canceller 200ms tail len 78054 7.805 147.96 16KHz echo canceller 256ms tail len 84739 8.474 160.63 16KHz echo canceller 400ms tail len 107738 10.774 204.23 16KHz echo canceller 500ms tail len 129879 12.988 246.20 16KHz echo canceller 512ms tail len 133796 13.380 253.62 16KHz echo canceller 600ms tail len 152166 15.217 288.45 16KHz echo canceller 800ms tail len 205415 20.542 389.38 16KHz tone generator with single freq 3489 0.349 6.61 16KHz tone generator with dual freq 6996 0.700 13.26 16KHz codec encode/decode - G.722 32803 3.280 62.18 16KHz codec encode/decode - Speex 16Khz 156629 15.663 296.91 16KHz codec encode/decode - L16/16000/1 434 0.043 0.82 16KHz stream TX/RX - G.722 20959 2.096 39.73 }}} === PJSIP-0.9.0, Windows XP, Pentium 4, Visual Studio 2005 === ||Hardware:||HP PC|| ||Platform:||Windows XP SP2|| ||Processor:||Pentium 4 (single core, no Hyper-Threading)|| ||Speed:||2.6 GHz|| ||Assumed MIPS:|| 8102 MIPS|| ||BogoMIPS:|| - || ||Compilation:|| Default Release settings (/O2) || ||Compiler:|| Visual Studio 2005 || Result: {{{ 09:46:14.571 os_core_win32. pjlib 0.9.0-trunk for win32 initialized MIPS test, with CPU=2666Mhz, 8102.0 MIPS Clock Item Time CPU MIPS Rate (usec) (%) ---------------------------------------------------------------------- 8KHz get from memplayer 11 0.001 0.09 8KHz conference bridge with 1 call 337 0.034 2.73 8KHz conference bridge with 2 calls 512 0.051 4.15 8KHz conference bridge with 4 calls 919 0.092 7.45 8KHz conference bridge with 8 calls 1658 0.166 13.43 8KHz conference bridge with 16 calls 3180 0.318 25.76 8KHz upsample+downsample - linear 288 0.029 2.33 8KHz upsample+downsample - small filter 7822 0.782 63.37 8KHz upsample+downsample - large filter 38386 3.839 311.00 8KHz WSOLA PLC - 0% loss 53 0.005 0.43 8KHz WSOLA PLC - 2% loss 61 0.006 0.49 8KHz WSOLA PLC - 5% loss 103 0.010 0.83 8KHz WSOLA PLC - 10% loss 152 0.015 1.23 8KHz WSOLA PLC - 20% loss 195 0.020 1.58 8KHz WSOLA PLC - 50% loss 520 0.052 4.21 8KHz WSOLA discard 2% excess 8 0.001 0.06 8KHz WSOLA discard 5% excess 27 0.003 0.22 8KHz WSOLA discard 10% excess 74 0.007 0.60 8KHz WSOLA discard 20% excess 117 0.012 0.95 8KHz WSOLA discard 50% excess 370 0.037 3.00 8KHz echo canceller 100ms tail len 20945 2.095 169.70 8KHz echo canceller 128ms tail len 20484 2.048 165.96 8KHz echo canceller 200ms tail len 21017 2.102 170.28 8KHz echo canceller 256ms tail len 21562 2.156 174.69 8KHz echo canceller 400ms tail len 23030 2.303 186.59 8KHz echo canceller 500ms tail len 24102 2.410 195.27 8KHz echo canceller 512ms tail len 24441 2.444 198.02 8KHz echo canceller 600ms tail len 25380 2.538 205.63 8KHz echo canceller 800ms tail len 28751 2.875 232.94 8KHz tone generator with single freq 84 0.008 0.68 8KHz tone generator with dual freq 125 0.013 1.01 8KHz codec encode/decode - G.711 135 0.014 1.09 8KHz codec encode/decode - GSM 6898 0.690 55.89 8KHz codec encode/decode - iLBC 39783 3.978 322.32 8KHz codec encode/decode - Speex 8Khz 24543 2.454 198.85 8KHz codec encode/decode - L16/8000/1 161 0.016 1.30 8KHz stream TX/RX - G.711 298 0.030 2.41 8KHz stream TX/RX - G.711 SRTP 32bit 633 0.063 5.13 8KHz stream TX/RX - G.711 SRTP 32bit +auth 1063 0.106 8.61 8KHz stream TX/RX - G.711 SRTP 80bit 634 0.063 5.14 8KHz stream TX/RX - G.711 SRTP 80bit +auth 1066 0.107 8.64 8KHz stream TX/RX - GSM 7182 0.718 58.19 8KHz stream TX/RX - GSM SRTP 32bit 7353 0.735 59.57 8KHz stream TX/RX - GSM SRTP 32bit + auth 7693 0.769 62.33 8KHz stream TX/RX - GSM SRTP 80bit 7313 0.731 59.25 8KHz stream TX/RX - GSM SRTP 80bit + auth 7673 0.767 62.17 16KHz get from memplayer 8 0.001 0.06 16KHz conference bridge with 1 call 592 0.059 4.80 16KHz conference bridge with 2 calls 907 0.091 7.35 16KHz conference bridge with 4 calls 1620 0.162 13.13 16KHz conference bridge with 8 calls 3055 0.306 24.75 16KHz conference bridge with 16 calls 5799 0.580 46.98 16KHz upsample+downsample - linear 560 0.056 4.54 16KHz upsample+downsample - small filter 15505 1.551 125.62 16KHz upsample+downsample - large filter 76944 7.694 623.40 16KHz WSOLA PLC - 0% loss 52 0.005 0.42 16KHz WSOLA PLC - 2% loss 263 0.026 2.13 16KHz WSOLA PLC - 5% loss 113 0.011 0.92 16KHz WSOLA PLC - 10% loss 383 0.038 3.10 16KHz WSOLA PLC - 20% loss 742 0.074 6.01 16KHz WSOLA PLC - 50% loss 1757 0.176 14.24 16KHz WSOLA discard 2% excess 9 0.001 0.07 16KHz WSOLA discard 5% excess 69 0.007 0.56 16KHz WSOLA discard 10% excess 220 0.022 1.78 16KHz WSOLA discard 20% excess 403 0.040 3.27 16KHz WSOLA discard 50% excess 1301 0.130 10.54 16KHz echo canceller 100ms tail len 42084 4.208 340.96 16KHz echo canceller 128ms tail len 42697 4.270 345.93 16KHz echo canceller 200ms tail len 43782 4.378 354.72 16KHz echo canceller 256ms tail len 45008 4.501 364.65 16KHz echo canceller 400ms tail len 49519 4.952 401.20 16KHz echo canceller 500ms tail len 51945 5.194 420.86 16KHz echo canceller 512ms tail len 52492 5.249 425.29 16KHz echo canceller 600ms tail len 54984 5.498 445.48 16KHz echo canceller 800ms tail len 60065 6.006 486.65 16KHz tone generator with single freq 161 0.016 1.30 16KHz tone generator with dual freq 239 0.024 1.94 16KHz codec encode/decode - G.722 9354 0.935 75.79 16KHz codec encode/decode - Speex 16Khz 51086 5.109 413.90 16KHz codec encode/decode - L16/16000/1 304 0.030 2.46 16KHz stream TX/RX - G.722 9570 0.957 77.54 }}}