C language > Tips & tricks

MMX, 3D Now! & SSE

(1/2) > >>

kobold:
I found a project on sourceforge about MMX, 3D Now! and SSE optimized functions for C:


--- Quote ---x86mph is an optimized library with vector, matrix, and vertex helpers (including an own TnL) It also contains memory helpers, and other stuff such as image processing (color inverting, changing bpp) taking advantage of x86's 3DNow!, MMX, and SSE
--- End quote ---


http://sourceforge.net/projects/x86mph

--> Is it possible to compile that with PellesC? He is using NASM for the assembler parts and M$ VC++. I have no clue from NASM and M$ VC++.


Here are some of my own snippets, used in one of my benchmarks:


--- Code: ---

_inline int _DoesCPUSupportSSE( void )
{
   _asm{
    // Check feature flag 25 in EDX for SSE support
      MOV EAX, 1              
      CPUID                      
      MOV EAX, EDX          
      SHR EAX, 25              
      AND EAX, 1                  
      PUSH EAX
   }
}


_inline int _DoesCPUSupportMMX( void )
{
   _asm{
    // Check feature flag 23 in EDX for MMX support
      MOV EAX, 1              
      CPUID                      
      MOV EAX, EDX          
      SHR EAX, 23              
      AND EAX, 1                  
      PUSH EAX
   }
}


_inline void _EndMMX( void )
{
   _asm
   {
      EMMS
   }
}


// memcpy >without< acceleration:
_inline void memcpy2( void *pDest, const void *pSrc, long size )
{
    _asm
   {
      //CLD
      PUSH ESI
      PUSH EDI
      //PUSH ECX
      MOV ESI, [pSrc]
      MOV EDI, [pDest]
      MOV ECX, [size] // length of pSrc
      REP MOVSD // MOVSD for double word, MOVSW for word, or MOVSB for bytes
      //POP ECX
      POP EDI
      POP ESI
}


// MMX accelerated version of memcpy:
// size must be multiple of 8!
_inline void memcpy3( void *pDest, const void *pSrc, long size )
{
   _asm
   {
      PUSH ESI
      PUSH EDI
      MOV ECX, [size]
      SHR ECX, 3 // div 8 - 2^3
      MOV ESI, [pSrc]
      MOV EDI, [pDest]
      LEA ESI, [ESI+ECX*8]
      LEA EDI, [EDI+ECX*8]
      NEG ECX

      loop1:
      MOVQ mm0, qword ptr [ESI+ECX*8] // 8 bytes per register
      MOVQ mm1, qword ptr [ESI+ECX*8+8]
      MOVQ mm2, qword ptr [ESI+ECX*8+16]
      MOVQ mm3, qword ptr [ESI+ECX*8+24]
      MOVQ mm4, qword ptr [ESI+ECX*8+32]
      MOVQ mm5, qword ptr [ESI+ECX*8+40]
      MOVQ mm6, qword ptr [ESI+ECX*8+48]
      MOVQ mm7, qword ptr [ESI+ECX*8+56]

      MOVNTQ qword ptr [EDI+ECX*8],    mm0
      MOVNTQ qword ptr [EDI+ECX*8+8],  mm1
      MOVNTQ qword ptr [EDI+ECX*8+16], mm2
      MOVNTQ qword ptr [EDI+ECX*8+24], mm3
      MOVNTQ qword ptr [EDI+ECX*8+32], mm4
      MOVNTQ qword ptr [EDI+ECX*8+40], mm5
      MOVNTQ qword ptr [EDI+ECX*8+48], mm6
      MOVNTQ qword ptr [EDI+ECX*8+56], mm7

      ADD ECX, 8
      JNZ loop1
      //SFENCE
      //EMMS
      POP EDI
      POP ESI
   }
}




--- End code ---



Usage:


--- Code: ---

...
// Init a source and a dest array - you are free to fill it with data  ;)
int test1[ARRAY];
int test2[ARRAY];
...

...
// Check if the feature is present:
if ( _DoesCPUSupportMMX() == 1 )
{
   puts("MMX... OK");
   // do a mmx operation...
}

if ( _DoesCPUSupportSSE() == 1 )
{
   puts("SSE... OK");
   // do a sse operation...
}
...

...
// if you use MOVSD for double words you have to divide the number of
// elements in the array through the size of (int) because int is a word or
// two bytes

memcpy2( test1, test2, sizeof(test2) / sizeof(int) );
...

...
// _EndMMX will clear the fpu registers
memcpy3( test1, test2, sizeof(test2) );
_EndMMX;
...


--- End code ---



Depending on the size of the buffers and the number of repeats and the type of optimizations i got totaly different results.

Pelle:
Sounds interesting. I will try to look at it...

Pelle

kobold:
Here is an output of my try:


--- Code: ---Building x87_math.obj.
*** Error: ml.exe -c -nologo -coff -W1 -Cu -Fo"E:\Programme\PellesC\Projects\x86mph\output\x87_math.obj" "E:\Programme\PellesC\Projects\x86mph\x87_math.asm"
*** Error: Das System kann die angegebene Datei nicht finden.  ("The system cannot find the specified file")
*** Error code: -1 ***
Done.

--- End code ---


But at first i tried it with devcpp. I got some .o files and one .a file ("x86mph.a").

Pelle:
I have only looked at this very briefly, so far. It should be possible to download NASM from sourceforge, go to Project options -> Macros tab, and edit the AS macro (to something like C:\NASM\blah blah) - removing the default "ML.EXE".

I can import the project file into the IDE, and when I try building, ML complains about the first ASM file (the syntax). Maybe it works if NASM is installed (I havn't tried it)??

Pelle

kobold:
Tried the nasm:


--- Code: ---Building x87_math.obj.
nasm: fatal: unrecognized debug format `oE:\Programme\PellesC\Projects\x86mph\output\x87_math.obj' for output format `bin'
type `nasm -h' for help
*** Error code: 1 ***
Done.
--- End code ---


 :(


PS: With the ASFLAGS "-f win32 -y null" it looks much better.


--- Code: ---
Building x87_math.obj.

valid debug formats for 'win32' output format are ('*' denotes default):
  * null      Null debug format
Building Declarations.obj.

valid debug formats for 'win32' output format are ('*' denotes default):
  * null      Null debug format
Building amalloc.obj.
E:\Programme\PellesC\Projects\x86mph\amalloc.c(37): fatal error #1035: Could not find include file "../../Include/x86mph.h".
*** Error code: 1 ***
Done.
--- End code ---


 :)

Navigation

[0] Message Index

[#] Next page

Go to full version