Version 5 (modified by bennylp, 16 years ago) (diff) |
---|
PJMEDIA MIPS Measurement
This page shows the CPU requirements/MIPS (Million Instructions Per Second) measurements of various PJMEDIA components which would be useful to evaluate PJMEDIA performance. Please do not interpret these numbers as an official or definite performance number, as these tests do not actually measure the actual number of instructions executed but rather the time, and there are many compilation flags in PJMEDIA as well as compiler switches that can be set to improve the performance.
Test Method
Each test should measure the overall performance for both directions. So for example for resampling, the test shows the total upsample and downsample time in a single test, and for codec it will show the total encoding and decoding time.
The test program depends on a correct setting of MIPS value of the processor being set correctly during compilation time. The test uses strictly one thread only.
To measure the MIPS value of a component, the program calculates the time to process 1 second worth of audio samples using that component, then calculates the MIPS value based on the configured MIPS value of the processor.
All the results below are done with the stock settings that come with PJSIP distribution. The test source code is available in pjmedia/src/test directory (mips_test.c file).
Interpreting the Results
Columns
There are four columns in the result table:
- Clock Rate:
- This shows the sampling rate of the component being tested, in Hz. We test both 8KHz and 16Hz. Most components can work in both 8KHz and 16KHz hence there will be two test results row for the same component, each with different clock rate. Some components (mostly related codec) can only work in one of the clock rate (e.g. GSM is only shown in 8KHz, while G.722 is only shown in 16KHz).
- Time (usec):
- This shows the time elapsed to process 1 second worth of audio samples, in microseconds.
- CPU (%):
- From the elapsed time above, we can measure how much CPU usage needed to run this component in real-time. For example, if the time elapsed is 1 second (one million microseconds) then this component will take 100% of CPU time when run in real-time. Or if the time elapsed is 0.5 second (500 thousands microseconds) then this component will take 50% of the CPU time when run in real-time.
The CPU percentage maybe larger than 100% if the time taken to process 1 second worth of audio samples is more than 1 second.
- MIPS:
- Also from the elapsed time above, we can measure the MIPS needed to run this component in real-time, since we know (or we can assume) the MIPS value of the processor.
Rows
The rows shows the measurement result of a particular components. The components tested are described below.
- get from memplayer:
- The memory/buffer based player port supplies the audio samples for almost all of the tests, so its time adds as overhead for all tests.
- conference bridge with N call(s):
- This measures the performance of the conference bridge with N calls. Note that we don't use actual call for the test since we only want to measure the conference bridge performance and not codec performance (this will be measured in separate tests). So for this test we use memplayer for each call to supply audio to the bridge. During the test all the calls (ports) will be connected to port zero and port zero will be connected to all calls. No connection among calls is created.
- upsample+downsample:
- This measures the performance of the resampling algorithm used. The test gets the audio from the memplayer, upsample it twice the clock rate, then downsample it half the clock rate again so that the clock rate now is the same as originally. This test measures linear resampling and polyphase resampling using small filter and large filter.
- WSOLA PLC - N% loss:
- This measures the performance of Waveform Similarity based Overlap and Add (WSOLA) algorithm when it is used to generate/emulate lost packet (a.k.a Packet Lost Concealment/PLC). Timing for various loss percentage is shown. The WSOLA algorithm is used by both the delay buffer and PLC algorithm in pjmedia. The delay buffer itself is used by the splitcomb, sound port, and the conference bridge to adapt to audio burst and clock drifts.
- WSOLA discard N% excess:
- This measures the performance of Waveform Similarity based Overlap and Add (WSOLA) algorithm when it is used to discard excess audio samples (e.g. caused by clock drifts). Timing for various excess percentage is shown. The WSOLA algorithm is used by both the delay buffer and PLC algorithm in pjmedia. The delay buffer itself is used by the splitcomb, sound port, and the conference bridge to adapt to audio burst and clock drifts.
- echo canceller Nms tail len:
- This measures the performance of the acoustic echo canceller (AEC) for various echo tail settings. The audio source is taken from memplayer, and there is no acoustic delay in the audio samples.
- tone generator with single/dual freq:
- This measures the performance of the tone generator to continuously generate single or dual frequency tone.
- codec encode/decode:
- This measures the time to encode and then decode 1 second worth of audio samples using the specified codec.
- stream TX/RX:
- This test is intended to measure the performance/overhead of the stream, which consist of RTP/RTCP processing and de-jitter buffering. In addition it also tests the performance of Secure RTP (SRTP) for various settings combination and codec bandwidth. Since the test here also consists of codec processing (actual encoding and decoding), you need to subtract the result with the result of the corresponding codec to measure the overhead of the stream and SRTP only.
Results
PJSIP-0.9.0, Linux, ARM9 (ARM926EJ-S), gcc
Hardware: | Olimex SAM9-L9260 board |
Platform: | Linux 2.6.23 |
Processor: | ARM926EJ-S rev 5 (v5l) |
Speed: | 180 MHz |
Assumed MIPS: | 198 MIPS |
BogoMIPS: | 98.91 |
Compilation: | arm-926-linux-gnu-gcc -O2 -msoft-float -DNDEBUG -DPJ_HAS_FLOATING_POINT=0 |
gcc: | version 4.2.1 --with-cpu=arm926ej-s -march=armv5te -msoft-float --with-float=soft |
Result:
00:59:38.531 os_core_unix.c pjlib 0.9.0-trunk for POSIX initialized MIPS test, with CPU=180Mhz, 198.0 MIPS Clock Item Time CPU MIPS Rate (usec) (%) ---------------------------------------------------------------------- 8KHz get from memplayer 181 0.018 0.04 8KHz conference bridge with 1 call 6682 0.668 1.32 8KHz conference bridge with 2 calls 11943 1.194 2.36 8KHz conference bridge with 4 calls 22402 2.240 4.44 8KHz conference bridge with 8 calls 42969 4.297 8.51 8KHz conference bridge with 16 calls 83328 8.333 16.50 8KHz upsample+downsample - linear 5815 0.581 1.15 8KHz upsample+downsample - small filter 66786 6.679 13.22 8KHz upsample+downsample - large filter 870754 87.075 172.41 8KHz WSOLA PLC - 0% loss 605 0.060 0.12 8KHz WSOLA PLC - 2% loss 1004 0.100 0.20 8KHz WSOLA PLC - 5% loss 1541 0.154 0.31 8KHz WSOLA PLC - 10% loss 1803 0.180 0.36 8KHz WSOLA PLC - 20% loss 3102 0.310 0.61 8KHz WSOLA PLC - 50% loss 8431 0.843 1.67 8KHz WSOLA discard 2% excess 214 0.021 0.04 8KHz WSOLA discard 5% excess 488 0.049 0.10 8KHz WSOLA discard 10% excess 1178 0.118 0.23 8KHz WSOLA discard 20% excess 2009 0.201 0.40 8KHz WSOLA discard 50% excess 6432 0.643 1.27 8KHz echo canceller 100ms tail len 335870 33.587 66.50 8KHz echo canceller 128ms tail len 336225 33.623 66.57 8KHz echo canceller 200ms tail len 349240 34.924 69.15 8KHz echo canceller 256ms tail len 363206 36.321 71.91 8KHz echo canceller 400ms tail len 400026 40.003 79.21 8KHz echo canceller 500ms tail len 426646 42.665 84.48 8KHz echo canceller 512ms tail len 432291 43.229 85.59 8KHz echo canceller 600ms tail len 454965 45.496 90.08 8KHz echo canceller 800ms tail len 516487 51.649 102.26 8KHz tone generator with single freq 920 0.092 0.18 8KHz tone generator with dual freq 1428 0.143 0.28 8KHz codec encode/decode - G.711 2701 0.270 0.53 8KHz codec encode/decode - GSM 75750 7.575 15.00 8KHz codec encode/decode - iLBC 2856203 285.620 565.53 8KHz codec encode/decode - Speex 8Khz 436162 43.616 86.36 8KHz codec encode/decode - L16/8000/1 1704 0.170 0.34 8KHz stream TX/RX - G.711 6786 0.679 1.34 8KHz stream TX/RX - G.711 SRTP 32bit 21688 2.169 4.29 8KHz stream TX/RX - G.711 SRTP 32bit +auth 33501 3.350 6.63 8KHz stream TX/RX - G.711 SRTP 80bit 21725 2.172 4.30 8KHz stream TX/RX - G.711 SRTP 80bit +auth 33551 3.355 6.64 8KHz stream TX/RX - GSM 82035 8.203 16.24 8KHz stream TX/RX - GSM SRTP 32bit 90890 9.089 18.00 8KHz stream TX/RX - GSM SRTP 32bit + auth 99334 9.933 19.67 8KHz stream TX/RX - GSM SRTP 80bit 90893 9.089 18.00 8KHz stream TX/RX - GSM SRTP 80bit + auth 99356 9.936 19.67 16KHz get from memplayer 239 0.024 0.05 16KHz conference bridge with 1 call 12780 1.278 2.53 16KHz conference bridge with 2 calls 23052 2.305 4.56 16KHz conference bridge with 4 calls 43174 4.317 8.55 16KHz conference bridge with 8 calls 82096 8.210 16.26 16KHz conference bridge with 16 calls 158565 15.856 31.40 16KHz upsample+downsample - linear 11469 1.147 2.27 16KHz upsample+downsample - small filter 133088 13.309 26.35 16KHz upsample+downsample - large filter 1739742 173.974 344.47 16KHz WSOLA PLC - 0% loss 980 0.098 0.19 16KHz WSOLA PLC - 2% loss 1910 0.191 0.38 16KHz WSOLA PLC - 5% loss 3734 0.373 0.74 16KHz WSOLA PLC - 10% loss 7867 0.787 1.56 16KHz WSOLA PLC - 20% loss 13007 1.301 2.58 16KHz WSOLA PLC - 50% loss 29022 2.902 5.75 16KHz WSOLA discard 2% excess 551 0.055 0.11 16KHz WSOLA discard 5% excess 1027 0.103 0.20 16KHz WSOLA discard 10% excess 1973 0.197 0.39 16KHz WSOLA discard 20% excess 10454 1.045 2.07 16KHz WSOLA discard 50% excess 22276 2.228 4.41 16KHz echo canceller 100ms tail len 664649 66.465 131.60 16KHz echo canceller 128ms tail len 682686 68.269 135.17 16KHz echo canceller 200ms tail len 720924 72.092 142.74 16KHz echo canceller 256ms tail len 752928 75.293 149.08 16KHz echo canceller 400ms tail len 877528 87.753 173.75 16KHz echo canceller 500ms tail len 970559 97.056 192.17 16KHz echo canceller 512ms tail len 989839 98.984 195.99 16KHz echo canceller 600ms tail len 1065465 106.547 210.96 16KHz echo canceller 800ms tail len 1285075 128.508 254.44 16KHz tone generator with single freq 1617 0.162 0.32 16KHz tone generator with dual freq 2632 0.263 0.52 16KHz codec encode/decode - G.722 148080 14.808 29.32 16KHz codec encode/decode - Speex 16Khz 979202 97.920 193.88 16KHz codec encode/decode - L16/16000/1 3244 0.324 0.64 16KHz stream TX/RX - G.722 155685 15.568 30.83
PJSIP-0.9.0, Linux, Pentium3, gcc
Hardware: | IBM X21 Notebook |
Platform: | Linux 2.6.23 |
Processor: | Pentium III |
Speed: | 700 MHz |
Assumed MIPS: | 1895.6 MIPS |
BogoMIPS: | 1395.36 |
Compilation: | -O3 -march=pentium3 -fomit-frame-pointer -DNDEBUG |
gcc: | version 4.2.3 |
Result:
02:01:45.561 os_core_unix.c pjlib 0.9.0-trunk for POSIX initialized MIPS test, with CPU=700Mhz, 1895.6 MIPS Clock Item Time CPU MIPS Rate (usec) (%) ---------------------------------------------------------------------- 8KHz get from memplayer 23 0.002 0.04 8KHz conference bridge with 1 call 800 0.080 1.52 8KHz conference bridge with 2 calls 1395 0.140 2.64 8KHz conference bridge with 4 calls 2522 0.252 4.78 8KHz conference bridge with 8 calls 4704 0.470 8.92 8KHz conference bridge with 16 calls 9146 0.915 17.34 8KHz upsample+downsample - linear 589 0.059 1.12 8KHz upsample+downsample - small filter 9563 0.956 18.13 8KHz upsample+downsample - large filter 46644 4.664 88.42 8KHz WSOLA PLC - 0% loss 107 0.011 0.20 8KHz WSOLA PLC - 2% loss 240 0.024 0.45 8KHz WSOLA PLC - 5% loss 466 0.047 0.88 8KHz WSOLA PLC - 10% loss 524 0.052 0.99 8KHz WSOLA PLC - 20% loss 958 0.096 1.82 8KHz WSOLA PLC - 50% loss 2667 0.267 5.06 8KHz WSOLA discard 2% excess 57 0.006 0.11 8KHz WSOLA discard 5% excess 142 0.014 0.27 8KHz WSOLA discard 10% excess 364 0.036 0.69 8KHz WSOLA discard 20% excess 631 0.063 1.20 8KHz WSOLA discard 50% excess 2081 0.208 3.94 8KHz echo canceller 100ms tail len 40050 4.005 75.92 8KHz echo canceller 128ms tail len 33179 3.318 62.89 8KHz echo canceller 200ms tail len 35161 3.516 66.65 8KHz echo canceller 256ms tail len 37470 3.747 71.03 8KHz echo canceller 400ms tail len 45104 4.510 85.50 8KHz echo canceller 500ms tail len 50504 5.050 95.74 8KHz echo canceller 512ms tail len 50940 5.094 96.56 8KHz echo canceller 600ms tail len 56113 5.611 106.37 8KHz echo canceller 800ms tail len 71677 7.168 135.87 8KHz tone generator with single freq 1758 0.176 3.33 8KHz tone generator with dual freq 3506 0.351 6.65 8KHz codec encode/decode - G.711 357 0.036 0.68 8KHz codec encode/decode - GSM 11382 1.138 21.58 8KHz codec encode/decode - iLBC 46894 4.689 88.89 8KHz codec encode/decode - Speex 8Khz 64428 6.443 122.13 8KHz codec encode/decode - L16/8000/1 248 0.025 0.47 8KHz stream TX/RX - G.711 617 0.062 1.17 8KHz stream TX/RX - G.711 SRTP 32bit 1751 0.175 3.32 8KHz stream TX/RX - G.711 SRTP 32bit +auth 3161 0.316 5.99 8KHz stream TX/RX - G.711 SRTP 80bit 1773 0.177 3.36 8KHz stream TX/RX - G.711 SRTP 80bit +auth 3108 0.311 5.89 8KHz stream TX/RX - GSM 11755 1.176 22.28 8KHz stream TX/RX - GSM SRTP 32bit 12439 1.244 23.58 8KHz stream TX/RX - GSM SRTP 32bit + auth 13285 1.329 25.18 8KHz stream TX/RX - GSM SRTP 80bit 12270 1.227 23.26 8KHz stream TX/RX - GSM SRTP 80bit + auth 13358 1.336 25.32 16KHz get from memplayer 27 0.003 0.05 16KHz conference bridge with 1 call 1522 0.152 2.89 16KHz conference bridge with 2 calls 2711 0.271 5.14 16KHz conference bridge with 4 calls 4772 0.477 9.05 16KHz conference bridge with 8 calls 8913 0.891 16.90 16KHz conference bridge with 16 calls 18759 1.876 35.56 16KHz upsample+downsample - linear 1136 0.114 2.15 16KHz upsample+downsample - small filter 19231 1.923 36.45 16KHz upsample+downsample - large filter 93066 9.307 176.42 16KHz WSOLA PLC - 0% loss 177 0.018 0.34 16KHz WSOLA PLC - 2% loss 534 0.053 1.01 16KHz WSOLA PLC - 5% loss 1165 0.116 2.21 16KHz WSOLA PLC - 10% loss 2796 0.280 5.30 16KHz WSOLA PLC - 20% loss 4515 0.451 8.56 16KHz WSOLA PLC - 50% loss 10482 1.048 19.87 16KHz WSOLA discard 2% excess 168 0.017 0.32 16KHz WSOLA discard 5% excess 326 0.033 0.62 16KHz WSOLA discard 10% excess 654 0.065 1.24 16KHz WSOLA discard 20% excess 3526 0.353 6.68 16KHz WSOLA discard 50% excess 7507 0.751 14.23 16KHz echo canceller 100ms tail len 68547 6.855 129.94 16KHz echo canceller 128ms tail len 72619 7.262 137.66 16KHz echo canceller 200ms tail len 78054 7.805 147.96 16KHz echo canceller 256ms tail len 84739 8.474 160.63 16KHz echo canceller 400ms tail len 107738 10.774 204.23 16KHz echo canceller 500ms tail len 129879 12.988 246.20 16KHz echo canceller 512ms tail len 133796 13.380 253.62 16KHz echo canceller 600ms tail len 152166 15.217 288.45 16KHz echo canceller 800ms tail len 205415 20.542 389.38 16KHz tone generator with single freq 3489 0.349 6.61 16KHz tone generator with dual freq 6996 0.700 13.26 16KHz codec encode/decode - G.722 32803 3.280 62.18 16KHz codec encode/decode - Speex 16Khz 156629 15.663 296.91 16KHz codec encode/decode - L16/16000/1 434 0.043 0.82 16KHz stream TX/RX - G.722 20959 2.096 39.73