Changeset 5699 for pjproject/trunk/third_party/yuv/source/row_win.cc
- Timestamp:
- Nov 21, 2017 9:25:11 AM (6 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
pjproject/trunk/third_party/yuv/source/row_win.cc
r5633 r5699 1411 1411 pavgb xmm2, xmm4 1412 1412 1413 // step 2 - convert to U and V1414 // from here down is very similar to Y code except1415 // instead of 16 different pixels, its 8 pixels of U and 8 of V1413 // step 2 - convert to U and V 1414 // from here down is very similar to Y code except 1415 // instead of 16 different pixels, its 8 pixels of U and 8 of V 1416 1416 movdqa xmm1, xmm0 1417 1417 movdqa xmm3, xmm2 … … 1427 1427 paddb xmm0, xmm5 // -> unsigned 1428 1428 1429 // step 3 - store 8 U and 8 V values1429 // step 3 - store 8 U and 8 V values 1430 1430 movlps qword ptr [edx], xmm0 // U 1431 1431 movhps qword ptr [edx + edi], xmm0 // V … … 1483 1483 pavgb xmm2, xmm4 1484 1484 1485 // step 2 - convert to U and V1486 // from here down is very similar to Y code except1487 // instead of 16 different pixels, its 8 pixels of U and 8 of V1485 // step 2 - convert to U and V 1486 // from here down is very similar to Y code except 1487 // instead of 16 different pixels, its 8 pixels of U and 8 of V 1488 1488 movdqa xmm1, xmm0 1489 1489 movdqa xmm3, xmm2 … … 1500 1500 packsswb xmm0, xmm1 1501 1501 1502 // step 3 - store 8 U and 8 V values1502 // step 3 - store 8 U and 8 V values 1503 1503 movlps qword ptr [edx], xmm0 // U 1504 1504 movhps qword ptr [edx + edi], xmm0 // V … … 1550 1550 vpavgb ymm2, ymm2, ymm4 // mutated by vshufps 1551 1551 1552 // step 2 - convert to U and V1553 // from here down is very similar to Y code except1554 // instead of 32 different pixels, its 16 pixels of U and 16 of V1552 // step 2 - convert to U and V 1553 // from here down is very similar to Y code except 1554 // instead of 32 different pixels, its 16 pixels of U and 16 of V 1555 1555 vpmaddubsw ymm1, ymm0, ymm7 // U 1556 1556 vpmaddubsw ymm3, ymm2, ymm7 … … 1566 1566 vpaddb ymm0, ymm0, ymm5 // -> unsigned 1567 1567 1568 // step 3 - store 16 U and 16 V values1568 // step 3 - store 16 U and 16 V values 1569 1569 vextractf128 [edx], ymm0, 0 // U 1570 1570 vextractf128 [edx + edi], ymm0, 1 // V … … 1618 1618 vpavgb ymm2, ymm2, ymm4 // mutated by vshufps 1619 1619 1620 // step 2 - convert to U and V1621 // from here down is very similar to Y code except1622 // instead of 32 different pixels, its 16 pixels of U and 16 of V1620 // step 2 - convert to U and V 1621 // from here down is very similar to Y code except 1622 // instead of 32 different pixels, its 16 pixels of U and 16 of V 1623 1623 vpmaddubsw ymm1, ymm0, ymm7 // U 1624 1624 vpmaddubsw ymm3, ymm2, ymm7 … … 1635 1635 vpshufb ymm0, ymm0, ymmword ptr kShufARGBToUV_AVX // for vshufps/vphaddw 1636 1636 1637 // step 3 - store 16 U and 16 V values1637 // step 3 - store 16 U and 16 V values 1638 1638 vextractf128 [edx], ymm0, 0 // U 1639 1639 vextractf128 [edx + edi], ymm0, 1 // V … … 1751 1751 pavgb xmm2, xmm4 1752 1752 1753 // step 2 - convert to U and V1754 // from here down is very similar to Y code except1755 // instead of 16 different pixels, its 8 pixels of U and 8 of V1753 // step 2 - convert to U and V 1754 // from here down is very similar to Y code except 1755 // instead of 16 different pixels, its 8 pixels of U and 8 of V 1756 1756 movdqa xmm1, xmm0 1757 1757 movdqa xmm3, xmm2 … … 1767 1767 paddb xmm0, xmm5 // -> unsigned 1768 1768 1769 // step 3 - store 8 U and 8 V values1769 // step 3 - store 8 U and 8 V values 1770 1770 movlps qword ptr [edx], xmm0 // U 1771 1771 movhps qword ptr [edx + edi], xmm0 // V … … 1823 1823 pavgb xmm2, xmm4 1824 1824 1825 // step 2 - convert to U and V1826 // from here down is very similar to Y code except1827 // instead of 16 different pixels, its 8 pixels of U and 8 of V1825 // step 2 - convert to U and V 1826 // from here down is very similar to Y code except 1827 // instead of 16 different pixels, its 8 pixels of U and 8 of V 1828 1828 movdqa xmm1, xmm0 1829 1829 movdqa xmm3, xmm2 … … 1839 1839 paddb xmm0, xmm5 // -> unsigned 1840 1840 1841 // step 3 - store 8 U and 8 V values1841 // step 3 - store 8 U and 8 V values 1842 1842 movlps qword ptr [edx], xmm0 // U 1843 1843 movhps qword ptr [edx + edi], xmm0 // V … … 1895 1895 pavgb xmm2, xmm4 1896 1896 1897 // step 2 - convert to U and V1898 // from here down is very similar to Y code except1899 // instead of 16 different pixels, its 8 pixels of U and 8 of V1897 // step 2 - convert to U and V 1898 // from here down is very similar to Y code except 1899 // instead of 16 different pixels, its 8 pixels of U and 8 of V 1900 1900 movdqa xmm1, xmm0 1901 1901 movdqa xmm3, xmm2 … … 1911 1911 paddb xmm0, xmm5 // -> unsigned 1912 1912 1913 // step 3 - store 8 U and 8 V values1913 // step 3 - store 8 U and 8 V values 1914 1914 movlps qword ptr [edx], xmm0 // U 1915 1915 movhps qword ptr [edx + edi], xmm0 // V … … 2928 2928 packuswb xmm0, xmm0 // G 2929 2929 2930 // Step 2: Weave into ARGB2930 // Step 2: Weave into ARGB 2931 2931 punpcklbw xmm0, xmm0 // GG 2932 2932 movdqa xmm1, xmm0 … … 2976 2976 vpackuswb ymm0, ymm0, ymm0 // G. still mutated: 3120 2977 2977 2978 // TODO(fbarchard): Weave alpha with unpack.2979 // Step 2: Weave into ARGB2978 // TODO(fbarchard): Weave alpha with unpack. 2979 // Step 2: Weave into ARGB 2980 2980 vpunpcklbw ymm1, ymm0, ymm0 // GG - mutates 2981 2981 vpermq ymm1, ymm1, 0xd8 … … 4068 4068 sub edi, esi 4069 4069 4070 // 8 pixel loop.4070 // 8 pixel loop. 4071 4071 convertloop8: 4072 4072 movq xmm0, qword ptr [esi] // alpha … … 4124 4124 sub edi, esi 4125 4125 4126 // 32 pixel loop.4126 // 32 pixel loop. 4127 4127 convertloop32: 4128 4128 vmovdqu ymm0, [esi] // alpha … … 4184 4184 jl convertloop4b // less than 4 pixels? 4185 4185 4186 // 4 pixel loop.4186 // 4 pixel loop. 4187 4187 convertloop4: 4188 4188 movdqu xmm3, [eax] // src argb … … 4213 4213 jl convertloop1b 4214 4214 4215 // 1 pixel loop.4215 // 1 pixel loop. 4216 4216 convertloop1: 4217 4217 movd xmm3, [eax] // src argb … … 5257 5257 packssdw xmm5, xmm5 // 16 bit shorts 5258 5258 5259 // 4 pixel loop small blocks.5259 // 4 pixel loop small blocks. 5260 5260 s4: 5261 5261 // top left … … 5299 5299 jmp l4b 5300 5300 5301 // 4 pixel loop5301 // 4 pixel loop 5302 5302 l4: 5303 5303 // top left … … 5351 5351 jl l1b 5352 5352 5353 // 1 pixel loop5353 // 1 pixel loop 5354 5354 l1: 5355 5355 movdqu xmm0, [eax] … … 5393 5393 jne l4b 5394 5394 5395 // 4 pixel loop5395 // 4 pixel loop 5396 5396 l4: 5397 5397 movdqu xmm2, [eax] // 4 argb pixels 16 bytes. … … 5439 5439 jl l1b 5440 5440 5441 // 1 pixel loop5441 // 1 pixel loop 5442 5442 l1: 5443 5443 movd xmm2, dword ptr [eax] // 1 argb pixel 4 bytes. … … 5482 5482 jl l4b 5483 5483 5484 // setup for 4 pixel loop5484 // setup for 4 pixel loop 5485 5485 pshufd xmm7, xmm7, 0x44 // dup dudv 5486 5486 pshufd xmm5, xmm5, 0 // dup 4, stride … … 5494 5494 addps xmm4, xmm4 // dudv *= 4 5495 5495 5496 // 4 pixel loop5496 // 4 pixel loop 5497 5497 l4: 5498 5498 cvttps2dq xmm0, xmm2 // x, y float to int first 2 … … 5525 5525 jl l1b 5526 5526 5527 // 1 pixel loop5527 // 1 pixel loop 5528 5528 l1: 5529 5529 cvttps2dq xmm0, xmm2 // x, y float to int … … 5599 5599 jmp xloop99 5600 5600 5601 // Blend 50 / 50.5601 // Blend 50 / 50. 5602 5602 xloop50: 5603 5603 vmovdqu ymm0, [esi] … … 5609 5609 jmp xloop99 5610 5610 5611 // Blend 100 / 0 - Copy row unchanged.5611 // Blend 100 / 0 - Copy row unchanged. 5612 5612 xloop100: 5613 5613 rep movsb … … 5639 5639 mov eax, [esp + 8 + 20] // source_y_fraction (0..255) 5640 5640 sub edi, esi 5641 // Dispatch to specialized filters if applicable.5641 // Dispatch to specialized filters if applicable. 5642 5642 cmp eax, 0 5643 5643 je xloop100 // 0 /256. Blend 100 / 0. … … 5679 5679 jmp xloop99 5680 5680 5681 // Blend 50 / 50.5681 // Blend 50 / 50. 5682 5682 xloop50: 5683 5683 movdqu xmm0, [esi] … … 5690 5690 jmp xloop99 5691 5691 5692 // Blend 100 / 0 - Copy row unchanged.5692 // Blend 100 / 0 - Copy row unchanged. 5693 5693 xloop100: 5694 5694 movdqu xmm0, [esi] … … 5785 5785 je shuf_2103 5786 5786 5787 // TODO(fbarchard): Use one source pointer and 3 offsets.5787 // TODO(fbarchard): Use one source pointer and 3 offsets. 5788 5788 shuf_any1: 5789 5789 movzx ebx, byte ptr [esi] … … 5972 5972 pxor xmm3, xmm3 // 0 constant for zero extending bytes to ints. 5973 5973 5974 // 2 pixel loop.5974 // 2 pixel loop. 5975 5975 convertloop: 5976 5976 // pmovzxbd xmm0, dword ptr [eax] // BGRA pixel … … 6073 6073 sub edx, eax 6074 6074 6075 // 8 pixel loop.6075 // 8 pixel loop. 6076 6076 convertloop: 6077 6077 movdqu xmm2, xmmword ptr [eax] // 8 shorts … … 6111 6111 sub edx, eax 6112 6112 6113 // 16 pixel loop.6113 // 16 pixel loop. 6114 6114 convertloop: 6115 6115 vmovdqu ymm2, [eax] // 16 shorts … … 6145 6145 sub edx, eax 6146 6146 6147 // 16 pixel loop.6147 // 16 pixel loop. 6148 6148 convertloop: 6149 6149 vpmovzxwd ymm2, xmmword ptr [eax] // 8 shorts -> 8 ints … … 6253 6253 pxor xmm5, xmm5 6254 6254 6255 // 4 pixel loop.6255 // 4 pixel loop. 6256 6256 convertloop: 6257 6257 movdqu xmm0, xmmword ptr [eax] // generate luma ptr
Note: See TracChangeset
for help on using the changeset viewer.