Changes between Version 5 and Version 6 of PJMEDIA-MIPS
- Timestamp:
- Jul 5, 2008 9:40:48 AM (16 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
PJMEDIA-MIPS
v5 v6 1 1 = PJMEDIA MIPS Measurement = 2 2 3 This page shows the CPU requirements/MIPS (Million Instructions Per Second) measurements of various PJMEDIA components which would be useful to evaluate PJMEDIA performance. Please do not interpret these numbers as an official or definite performance number, as these tests do not actually measure the actual number of instructions executed but rather the time, and there are many compilation flags in PJMEDIA as well as compiler switches that can be set to improvethe performance.3 This page attempts to show the typical performance characteristic of various PJMEDIA components, which could be useful to evaluate PJMEDIA performance. Please do not interpret these numbers as an official or definite performance number, as there are many compilation flags in PJMEDIA as well as compiler switches that can be set to increase (or decrease) the performance. 4 4 5 5 == Test Method == 6 6 7 Each test should measure the overall performance for both directions. So for example for resampling, the test shows the total upsample and downsample time in a single test, and for codec it will show the total encoding and decoding time. 8 9 The test program depends on a correct setting of MIPS value of the processor being set correctly during compilation time. The test uses strictly one thread only. 10 11 To measure the MIPS value of a component, the program calculates the time to process 1 second worth of audio samples using that component, then calculates the MIPS value based on the configured MIPS value of the processor. 12 13 All the results below are done with the stock settings that come with PJSIP distribution. The test source code is available in '''pjmedia/src/test''' directory ({{{mips_test.c}}} file). 7 Each test should measure the overall performance for both directions. So for example for resampling, the test shows the total upsample and downsample time in a single test, and for codec it will show the total encoding and decoding time. 8 9 The test program depends on correct settings of CPU_MHZ and MIPS value of the processor being set correctly during compilation time. We used the MIPS information in the following links to assume the MIPS value of the processor: 10 - http://en.wikipedia.org/wiki/Million_instructions_per_second 11 - http://en.wikipedia.org/wiki/ARM_architecture 12 13 To measure the MIPS score of a component, the program calculates the time to process 1 second worth of audio samples using that component, then calculates the MIPS score based on the configured MIPS value of the processor. Because of this, the calculated MIPS shouldn't be interpreted as a ''real'' MIPS value since it's purely based on time measurement, and our assumed MIPS value for the processor may be wrong, and in platforms where floating-point is available, floating-point instructions will be used instead. 14 15 The test uses strictly one thread only. 16 17 All the results below are done with the default settings that come with PJSIP distribution. The test source code are available in '''pjmedia/src/test''' directory ({{{mips_test.c}}} file). 18 14 19 15 20 == Interpreting the Results == … … 20 25 21 26 '''Clock Rate:''' :: 22 This shows the sampling rate of the component being tested, in Hz. We test both 8KHz and 16Hz. Most components can work in both 8KHz and 16KHz hence there will be two test results row for the same component, each with different clock rate. Some components (mostly related codec) can only work in one of the clock rate (e.g. GSM is only shown in 8KHz, while G.722 is only shown in 16KHz).27 This shows the sampling rate of the component being tested, in KHz. We test both 8KHz and 16KHz. Most components can work in both 8KHz and 16KHz hence there will be two test result rows for the same component, each with different clock rate. Some components (mostly related codec) can only work in one of the clock rate (e.g. GSM is only shown in 8KHz, while G.722 is only shown in 16KHz) hence there will only be one test result row for these components. 23 28 24 29 '''Time (usec):''' :: … … 26 31 27 32 '''CPU (%):''' :: 28 From the elapsed time above, we can measure how much CPU usage needed to run this component in real-time. For example, if the time elapsed is 1 second (one million microseconds) then this component will take 100% of CPU time when run in real-time. Or if the time elapsed is 0.5 second (500 thousands microseconds)then this component will take 50% of the CPU time when run in real-time.29 30 The CPU percentage maybe larger than 100% if the time taken to process 1 second worth of audio samples is more than 1 second. 33 This shows how much CPU usage (in percent) this component will consume when running it in real-time. The value is derived from the time measurement above. For example, if the time elapsed is 1 secondthen this component will take 100% of CPU time when run in real-time. Or if the time elapsed is 0.5 second then this component will take 50% of the CPU time when run in real-time. 34 35 The CPU percentage maybe larger than 100% if the time taken to process 1 second worth of audio samples is more than 1 second. It may happen when we perform the test on slower processor. 31 36 32 37 '''MIPS:''' :: 33 Also from the elapsed time above, we can measure the MIPS needed to run this component in real-time, since we know (or we can assume) the MIPS value of the processor.38 The MIPS (Million Instructions per Second) score roughly means how many instructions will be executed by this component per second when we run this component in real-time. The value is derived from the time measurement above, and calculated based on the assumed MIPS value of the processor. Once again, the score may be incorrect for many reasons so it shouldn't be interpreted as an official/definite score, and especially one MUST NOT use the MIPS score to compare performance of different processor families/architectures. 34 39 35 40 36 41 === Rows === 37 42 38 The rows show sthe measurement result of a particular components. The components tested are described below.43 The rows show the measurement result of a particular components. The components tested are described below. 39 44 40 45 '''get from memplayer:''' :: … … 42 47 43 48 '''conference bridge with N call(s):''' :: 44 This measures the performance of the conference bridge with N calls. Note that we don't use actual call for the test since we only want to measure the conference bridge performance and not codec performance (this will be measured in separate tests). So for this test we use memplayer for each ''call'' to supply audio to the bridge. During the test all the calls (ports) will be connected to port zero and port zero will be connected to all calls. No connection among calls iscreated.49 This measures the performance of the conference bridge with N calls. Note that we don't use actual call for the test since we only want to measure the conference bridge performance and not codec performance (this will be measured in separate tests). So for this test we use memplayer for each "call" to supply audio to the bridge. During the test all the calls (ports) will be connected to port zero and port zero will be connected to all calls. No connection among calls are created. 45 50 46 51 '''upsample+downsample:''' :: 47 This measures the performance of the resampling algorithm used. The test gets the audio from the memplayer, upsample it twice the clock rate, then downsample it half the clock rate again so that the clock rate now is the same as originally. This test measures linear resampling and polyphase resampling using small filter and large filter.52 This measures the performance of the resampling algorithm used. The test gets the audio from the memplayer, upsample it twice the clock rate, then downsample it half the clock rate again so that the clock rate now is the same as originally. This test measures both linear and non-linear resampling using small filter and large filter. Some resampling backend algorithms may not support selecting between linear/non-linear and small/large filter, in that case the results will be equal for all settings. 48 53 49 54 '''WSOLA PLC - N% loss:''' :: 50 This measures the performance of Waveform Similarity based Overlap and Add (WSOLA) algorithm when it is used to generate/emulate lost packet (a.k.a Packet Lost Concealment/PLC). Timing for various loss percentage is shown. The WSOLA algorithm is used by both the delay buffer and PLC algorithm in pjmedia. The delay buffer itself is used by the splitcomb, sound port, and the conference bridge to adapt to audio burst and clock drifts. 55 This measures the performance of Waveform Similarity based Overlap and Add (WSOLA) algorithm when it is used to generate/emulate lost packet (a.k.a Packet Lost Concealment/PLC). Timing for various loss percentages are shown. 56 57 The WSOLA algorithm is used by both the delay buffer and PLC algorithm in pjmedia. The delay buffer itself is used by the splitcomb, sound port, and the conference bridge to adapt to audio burst and clock drifts. 51 58 52 59 '''WSOLA discard N% excess:''' :: 53 This measures the performance of Waveform Similarity based Overlap and Add (WSOLA) algorithm when it is used to discard excess audio samples (e.g. caused by clock drifts). Timing for various excess percentage is shown. The WSOLA algorithm is used by both the delay buffer and PLC algorithm in pjmedia. The delay buffer itself is used by the splitcomb, sound port, and the conference bridge to adapt to audio burst and clock drifts.60 This measures the performance of Waveform Similarity based Overlap and Add (WSOLA) algorithm when it is used to discard excess audio samples (e.g. caused by clock drifts). Timing for various excess percentages are shown. 54 61 55 62 '''echo canceller Nms tail len:''' :: 56 This measures the performance of the acoustic echo canceller (AEC) for various echo tail settings. The audio source is taken from memplayer, and there is no acoustic delay in the audio samples.63 This measures the performance of the acoustic echo canceller (AEC) for various echo tail settings. The audio source is taken from memplayer, and there is no acoustic delay in the AEC input. 57 64 58 65 '''tone generator with single/dual freq:''' :: 59 This measures the performance of the tone generator to continuously generate single or dual frequency tone .66 This measures the performance of the tone generator to continuously generate single or dual frequency tone for 1 second. 60 67 61 68 '''codec encode/decode:''' :: 62 This measures the time to encode and then decode 1 second worth of audio samples using the specified codec .69 This measures the time to encode and then decode 1 second worth of audio samples using the specified codec for 1 second. 63 70 64 71 '''stream TX/RX:''' :: 65 This test is intended to measure the performance/overhead of the stream, which consist of RTP/RTCP processing and de-jitter buffering. In addition it also tests the performance of Secure RTP (SRTP) for various settings combination and codec bandwidth. Since the test here also consists of codec processing (actualencoding and decoding), you need to subtract the result with the result of the corresponding codec to measure the overhead of the stream and SRTP only.72 This test is intended to measure the performance/overhead of the stream, which consist of codec, RTP/RTCP processing, and de-jitter buffering. In addition it also tests the performance of Secure RTP (SRTP) for various setting combinations and codec bandwidth. Since the test here also consists of codec processing (encoding and decoding), you need to subtract the result with the result of the corresponding codec to measure the overhead of the stream and SRTP only. 66 73 67 74 … … 274 281 16KHz stream TX/RX - G.722 20959 2.096 39.73 275 282 }}} 283 284 285 === PJSIP-0.9.0, Windows XP, Pentium 4, Visual Studio 2005 === 286 287 ||Hardware:||HP PC|| 288 ||Platform:||Windows XP SP2|| 289 ||Processor:||Pentium 4 (single core, no Hyper-Threading)|| 290 ||Speed:||2.6 GHz|| 291 ||Assumed MIPS:|| 8102 MIPS|| 292 ||BogoMIPS:|| - || 293 ||Compilation:|| Default Release settings (/O2) || 294 ||Compiler:|| Visual Studio 2005 || 295 296 Result: 297 298 {{{ 299 09:46:14.571 os_core_win32. pjlib 0.9.0-trunk for win32 initialized 300 MIPS test, with CPU=2666Mhz, 8102.0 MIPS 301 Clock Item Time CPU MIPS 302 Rate (usec) (%) 303 ---------------------------------------------------------------------- 304 8KHz get from memplayer 11 0.001 0.09 305 8KHz conference bridge with 1 call 337 0.034 2.73 306 8KHz conference bridge with 2 calls 512 0.051 4.15 307 8KHz conference bridge with 4 calls 919 0.092 7.45 308 8KHz conference bridge with 8 calls 1658 0.166 13.43 309 8KHz conference bridge with 16 calls 3180 0.318 25.76 310 8KHz upsample+downsample - linear 288 0.029 2.33 311 8KHz upsample+downsample - small filter 7822 0.782 63.37 312 8KHz upsample+downsample - large filter 38386 3.839 311.00 313 8KHz WSOLA PLC - 0% loss 53 0.005 0.43 314 8KHz WSOLA PLC - 2% loss 61 0.006 0.49 315 8KHz WSOLA PLC - 5% loss 103 0.010 0.83 316 8KHz WSOLA PLC - 10% loss 152 0.015 1.23 317 8KHz WSOLA PLC - 20% loss 195 0.020 1.58 318 8KHz WSOLA PLC - 50% loss 520 0.052 4.21 319 8KHz WSOLA discard 2% excess 8 0.001 0.06 320 8KHz WSOLA discard 5% excess 27 0.003 0.22 321 8KHz WSOLA discard 10% excess 74 0.007 0.60 322 8KHz WSOLA discard 20% excess 117 0.012 0.95 323 8KHz WSOLA discard 50% excess 370 0.037 3.00 324 8KHz echo canceller 100ms tail len 20945 2.095 169.70 325 8KHz echo canceller 128ms tail len 20484 2.048 165.96 326 8KHz echo canceller 200ms tail len 21017 2.102 170.28 327 8KHz echo canceller 256ms tail len 21562 2.156 174.69 328 8KHz echo canceller 400ms tail len 23030 2.303 186.59 329 8KHz echo canceller 500ms tail len 24102 2.410 195.27 330 8KHz echo canceller 512ms tail len 24441 2.444 198.02 331 8KHz echo canceller 600ms tail len 25380 2.538 205.63 332 8KHz echo canceller 800ms tail len 28751 2.875 232.94 333 8KHz tone generator with single freq 84 0.008 0.68 334 8KHz tone generator with dual freq 125 0.013 1.01 335 8KHz codec encode/decode - G.711 135 0.014 1.09 336 8KHz codec encode/decode - GSM 6898 0.690 55.89 337 8KHz codec encode/decode - iLBC 39783 3.978 322.32 338 8KHz codec encode/decode - Speex 8Khz 24543 2.454 198.85 339 8KHz codec encode/decode - L16/8000/1 161 0.016 1.30 340 8KHz stream TX/RX - G.711 298 0.030 2.41 341 8KHz stream TX/RX - G.711 SRTP 32bit 633 0.063 5.13 342 8KHz stream TX/RX - G.711 SRTP 32bit +auth 1063 0.106 8.61 343 8KHz stream TX/RX - G.711 SRTP 80bit 634 0.063 5.14 344 8KHz stream TX/RX - G.711 SRTP 80bit +auth 1066 0.107 8.64 345 8KHz stream TX/RX - GSM 7182 0.718 58.19 346 8KHz stream TX/RX - GSM SRTP 32bit 7353 0.735 59.57 347 8KHz stream TX/RX - GSM SRTP 32bit + auth 7693 0.769 62.33 348 8KHz stream TX/RX - GSM SRTP 80bit 7313 0.731 59.25 349 8KHz stream TX/RX - GSM SRTP 80bit + auth 7673 0.767 62.17 350 16KHz get from memplayer 8 0.001 0.06 351 16KHz conference bridge with 1 call 592 0.059 4.80 352 16KHz conference bridge with 2 calls 907 0.091 7.35 353 16KHz conference bridge with 4 calls 1620 0.162 13.13 354 16KHz conference bridge with 8 calls 3055 0.306 24.75 355 16KHz conference bridge with 16 calls 5799 0.580 46.98 356 16KHz upsample+downsample - linear 560 0.056 4.54 357 16KHz upsample+downsample - small filter 15505 1.551 125.62 358 16KHz upsample+downsample - large filter 76944 7.694 623.40 359 16KHz WSOLA PLC - 0% loss 52 0.005 0.42 360 16KHz WSOLA PLC - 2% loss 263 0.026 2.13 361 16KHz WSOLA PLC - 5% loss 113 0.011 0.92 362 16KHz WSOLA PLC - 10% loss 383 0.038 3.10 363 16KHz WSOLA PLC - 20% loss 742 0.074 6.01 364 16KHz WSOLA PLC - 50% loss 1757 0.176 14.24 365 16KHz WSOLA discard 2% excess 9 0.001 0.07 366 16KHz WSOLA discard 5% excess 69 0.007 0.56 367 16KHz WSOLA discard 10% excess 220 0.022 1.78 368 16KHz WSOLA discard 20% excess 403 0.040 3.27 369 16KHz WSOLA discard 50% excess 1301 0.130 10.54 370 16KHz echo canceller 100ms tail len 42084 4.208 340.96 371 16KHz echo canceller 128ms tail len 42697 4.270 345.93 372 16KHz echo canceller 200ms tail len 43782 4.378 354.72 373 16KHz echo canceller 256ms tail len 45008 4.501 364.65 374 16KHz echo canceller 400ms tail len 49519 4.952 401.20 375 16KHz echo canceller 500ms tail len 51945 5.194 420.86 376 16KHz echo canceller 512ms tail len 52492 5.249 425.29 377 16KHz echo canceller 600ms tail len 54984 5.498 445.48 378 16KHz echo canceller 800ms tail len 60065 6.006 486.65 379 16KHz tone generator with single freq 161 0.016 1.30 380 16KHz tone generator with dual freq 239 0.024 1.94 381 16KHz codec encode/decode - G.722 9354 0.935 75.79 382 16KHz codec encode/decode - Speex 16Khz 51086 5.109 413.90 383 16KHz codec encode/decode - L16/16000/1 304 0.030 2.46 384 16KHz stream TX/RX - G.722 9570 0.957 77.54 385 }}} 386