= PJMEDIA MIPS Measurement = This page shows the CPU requirements/MIPS (Million Instructions Per Second) measurements of various PJMEDIA components which would be useful to evaluate PJMEDIA performance. Please do not interpret these numbers as the official or definite performance number, as there are many compilation flags in PJMEDIA that can be set, as well as compiler switches, to improve the performance. All the results below are done with the stock settings that come with PJSIP distribution. The test source is in {{{pjmedia/src/test}}} directory. == Test Method == Each test should measure the overall performance for both directions. So for example for resampling, the test shows the total upsample and downsample time in a single test, and for codec it will show the total encoding and decoding time. The test program depends on a correct setting of MIPS value of the processor being set correctly during compilation time. The test uses strictly one thread only. == Interpreting the Results == === Columns === There are four columns in the result table: '''Clock Rate:''' :: This shows the sampling rate of the component being tested, in Hz. We test both 8KHz and 16Hz. Most components can work in both 8KHz and 16KHz hence there will be two test results row for the same component, each with different clock rate. Some components (mostly related codec) can only work in one of the clock rate (e.g. GSM is only shown in 8KHz, while G.722 is only shown in 16KHz). '''Time (usec):''' :: This shows the time elapsed to process 1 second worth of audio samples, in microseconds. '''CPU (%):''' :: From the elapsed time above, we can measure how much CPU usage needed to run this component in real-time. For example, if the time elapsed is 1 second (one million microseconds) then this component will take 100% of CPU time when run in real-time. Or if the time elapsed is 0.5 second (500 thousands microseconds) then this component will take 50% of the CPU time when run in real-time. The CPU percentage maybe larger than 100% if the time taken to process 1 second worth of audio samples is more than 1 second. '''MIPS:''' :: Also from the elapsed time above, we can measure the MIPS needed to run this component in real-time, since we know (or we can assume) the MIPS value of the processor. === Rows === The rows shows the measurement result of a particular components. The components tested are described below. '''get from memplayer:''' :: The memory/buffer based player port supplies the audio samples for almost all of the tests, so its time adds as overhead for all tests. '''conference bridge with N call(s):''' :: This measures the performance of the conference bridge with N calls. Note that we don't use actual call for the test since we only want to measure the conference bridge performance and not codec performance (this will be measured in separate tests). So for this test we use memplayer for each ''call'' to supply audio to the bridge. During the test all the calls (ports) will be connected to port zero and port zero will be connected to all calls. No connection among calls is created. '''upsample+downsample:''' :: This measures the performance of the resampling algorithm used. The test gets the audio from the memplayer, upsample it twice the clock rate, then downsample it half the clock rate again so that the clock rate now is the same as originally. This test measures linear resampling and polyphase resampling using small filter and large filter. '''WSOLA PLC - N% loss:''' :: This measures the performance of Waveform Similarity based Overlap and Add (WSOLA) algorithm when it is used to generate/emulate lost packet (a.k.a Packet Lost Concealment/PLC). Timing for various loss percentage is shown. The WSOLA algorithm is used by both the delay buffer and PLC algorithm in pjmedia. The delay buffer itself is used by the splitcomb, sound port, and the conference bridge to adapt to audio burst and clock drifts. '''WSOLA discard N% excess:''' :: This measures the performance of Waveform Similarity based Overlap and Add (WSOLA) algorithm when it is used to discard excess audio samples (e.g. caused by clock drifts). Timing for various excess percentage is shown. The WSOLA algorithm is used by both the delay buffer and PLC algorithm in pjmedia. The delay buffer itself is used by the splitcomb, sound port, and the conference bridge to adapt to audio burst and clock drifts. '''echo canceller Nms tail len:''' :: This measures the performance of the acoustic echo canceller (AEC) for various echo tail settings. The audio source is taken from memplayer, and there is no acoustic delay in the audio samples. '''tone generator with single/dual freq:''' :: This measures the performance of the tone generator to continuously generate single or dual frequency tone. '''codec encode/decode:''' :: This measures the time to encode and then decode 1 second worth of audio samples using the specified codec. '''stream TX/RX:''' :: This test is intended to measure the performance/overhead of the stream, which consist of RTP/RTCP processing and de-jitter buffering. In addition it also tests the performance of Secure RTP (SRTP) for various settings combination and codec bandwidth. Since the test here also consists of codec processing (actual encoding and decoding), you need to subtract the result with the result of the corresponding codec to measure the overhead of the stream and SRTP only. == Results == === PJSIP-0.9.0, ARM9 (ARM926EJ-S) Linux === ||Hardware:||Olimex SAM9-L9260 board|| ||Platform:||Linux 2.6.23|| ||Processor:||ARM926EJ-S rev 5 (v5l)|| ||Speed:||180 MHz|| ||Assumed MIPS:||198 MIPS|| ||BogoMIPS:||98.91|| ||Compilation:||arm-926-linux-gnu-gcc -O2 -msoft-float -DNDEBUG -DPJ_HAS_FLOATING_POINT=0|| Result: {{{ 00:59:38.531 os_core_unix.c pjlib 0.9.0-trunk for POSIX initialized MIPS test, with CPU=180Mhz, 198.0 MIPS Clock Item Time CPU MIPS Rate (usec) (%) ---------------------------------------------------------------------- 8KHz get from memplayer 181 0.018 0.04 8KHz conference bridge with 1 call 6682 0.668 1.32 8KHz conference bridge with 2 calls 11943 1.194 2.36 8KHz conference bridge with 4 calls 22402 2.240 4.44 8KHz conference bridge with 8 calls 42969 4.297 8.51 8KHz conference bridge with 16 calls 83328 8.333 16.50 8KHz upsample+downsample - linear 5815 0.581 1.15 8KHz upsample+downsample - small filter 66786 6.679 13.22 8KHz upsample+downsample - large filter 870754 87.075 172.41 8KHz WSOLA PLC - 0% loss 605 0.060 0.12 8KHz WSOLA PLC - 2% loss 1004 0.100 0.20 8KHz WSOLA PLC - 5% loss 1541 0.154 0.31 8KHz WSOLA PLC - 10% loss 1803 0.180 0.36 8KHz WSOLA PLC - 20% loss 3102 0.310 0.61 8KHz WSOLA PLC - 50% loss 8431 0.843 1.67 8KHz WSOLA discard 2% excess 214 0.021 0.04 8KHz WSOLA discard 5% excess 488 0.049 0.10 8KHz WSOLA discard 10% excess 1178 0.118 0.23 8KHz WSOLA discard 20% excess 2009 0.201 0.40 8KHz WSOLA discard 50% excess 6432 0.643 1.27 8KHz echo canceller 100ms tail len 335870 33.587 66.50 8KHz echo canceller 128ms tail len 336225 33.623 66.57 8KHz echo canceller 200ms tail len 349240 34.924 69.15 8KHz echo canceller 256ms tail len 363206 36.321 71.91 8KHz echo canceller 400ms tail len 400026 40.003 79.21 8KHz echo canceller 500ms tail len 426646 42.665 84.48 8KHz echo canceller 512ms tail len 432291 43.229 85.59 8KHz echo canceller 600ms tail len 454965 45.496 90.08 8KHz echo canceller 800ms tail len 516487 51.649 102.26 8KHz tone generator with single freq 920 0.092 0.18 8KHz tone generator with dual freq 1428 0.143 0.28 8KHz codec encode/decode - G.711 2701 0.270 0.53 8KHz codec encode/decode - GSM 75750 7.575 15.00 8KHz codec encode/decode - iLBC 2856203 285.620 565.53 8KHz codec encode/decode - Speex 8Khz 436162 43.616 86.36 8KHz codec encode/decode - L16/8000/1 1704 0.170 0.34 8KHz stream TX/RX - G.711 6786 0.679 1.34 8KHz stream TX/RX - G.711 SRTP 32bit 21688 2.169 4.29 8KHz stream TX/RX - G.711 SRTP 32bit +auth 33501 3.350 6.63 8KHz stream TX/RX - G.711 SRTP 80bit 21725 2.172 4.30 8KHz stream TX/RX - G.711 SRTP 80bit +auth 33551 3.355 6.64 8KHz stream TX/RX - GSM 82035 8.203 16.24 8KHz stream TX/RX - GSM SRTP 32bit 90890 9.089 18.00 8KHz stream TX/RX - GSM SRTP 32bit + auth 99334 9.933 19.67 8KHz stream TX/RX - GSM SRTP 80bit 90893 9.089 18.00 8KHz stream TX/RX - GSM SRTP 80bit + auth 99356 9.936 19.67 16KHz get from memplayer 239 0.024 0.05 16KHz conference bridge with 1 call 12780 1.278 2.53 16KHz conference bridge with 2 calls 23052 2.305 4.56 16KHz conference bridge with 4 calls 43174 4.317 8.55 16KHz conference bridge with 8 calls 82096 8.210 16.26 16KHz conference bridge with 16 calls 158565 15.856 31.40 16KHz upsample+downsample - linear 11469 1.147 2.27 16KHz upsample+downsample - small filter 133088 13.309 26.35 16KHz upsample+downsample - large filter 1739742 173.974 344.47 16KHz WSOLA PLC - 0% loss 980 0.098 0.19 16KHz WSOLA PLC - 2% loss 1910 0.191 0.38 16KHz WSOLA PLC - 5% loss 3734 0.373 0.74 16KHz WSOLA PLC - 10% loss 7867 0.787 1.56 16KHz WSOLA PLC - 20% loss 13007 1.301 2.58 16KHz WSOLA PLC - 50% loss 29022 2.902 5.75 16KHz WSOLA discard 2% excess 551 0.055 0.11 16KHz WSOLA discard 5% excess 1027 0.103 0.20 16KHz WSOLA discard 10% excess 1973 0.197 0.39 16KHz WSOLA discard 20% excess 10454 1.045 2.07 16KHz WSOLA discard 50% excess 22276 2.228 4.41 16KHz echo canceller 100ms tail len 664649 66.465 131.60 16KHz echo canceller 128ms tail len 682686 68.269 135.17 16KHz echo canceller 200ms tail len 720924 72.092 142.74 16KHz echo canceller 256ms tail len 752928 75.293 149.08 16KHz echo canceller 400ms tail len 877528 87.753 173.75 16KHz echo canceller 500ms tail len 970559 97.056 192.17 16KHz echo canceller 512ms tail len 989839 98.984 195.99 16KHz echo canceller 600ms tail len 1065465 106.547 210.96 16KHz echo canceller 800ms tail len 1285075 128.508 254.44 16KHz tone generator with single freq 1617 0.162 0.32 16KHz tone generator with dual freq 2632 0.263 0.52 16KHz codec encode/decode - G.722 148080 14.808 29.32 16KHz codec encode/decode - Speex 16Khz 979202 97.920 193.88 16KHz codec encode/decode - L16/16000/1 3244 0.324 0.64 16KHz stream TX/RX - G.722 155685 15.568 30.83 }}}