NO

Author Topic: MMX, 3D Now! & SSE  (Read 6623 times)

kobold

  • Guest
MMX, 3D Now! & SSE
« on: December 22, 2005, 07:30:16 pm »
I found a project on sourceforge about MMX, 3D Now! and SSE optimized functions for C:

Quote
x86mph is an optimized library with vector, matrix, and vertex helpers (including an own TnL) It also contains memory helpers, and other stuff such as image processing (color inverting, changing bpp) taking advantage of x86's 3DNow!, MMX, and SSE


http://sourceforge.net/projects/x86mph

--> Is it possible to compile that with PellesC? He is using NASM for the assembler parts and M$ VC++. I have no clue from NASM and M$ VC++.


Here are some of my own snippets, used in one of my benchmarks:

Code: [Select]


_inline int _DoesCPUSupportSSE( void )
{
   _asm{
    // Check feature flag 25 in EDX for SSE support
      MOV EAX, 1              
      CPUID                      
      MOV EAX, EDX          
      SHR EAX, 25              
      AND EAX, 1                  
      PUSH EAX
   }
}


_inline int _DoesCPUSupportMMX( void )
{
   _asm{
    // Check feature flag 23 in EDX for MMX support
      MOV EAX, 1              
      CPUID                      
      MOV EAX, EDX          
      SHR EAX, 23              
      AND EAX, 1                  
      PUSH EAX
   }
}


_inline void _EndMMX( void )
{
   _asm
   {
      EMMS
   }
}


// memcpy >without< acceleration:
_inline void memcpy2( void *pDest, const void *pSrc, long size )
{
    _asm
   {
      //CLD
      PUSH ESI
      PUSH EDI
      //PUSH ECX
      MOV ESI, [pSrc]
      MOV EDI, [pDest]
      MOV ECX, [size] // length of pSrc
      REP MOVSD // MOVSD for double word, MOVSW for word, or MOVSB for bytes
      //POP ECX
      POP EDI
      POP ESI
}


// MMX accelerated version of memcpy:
// size must be multiple of 8!
_inline void memcpy3( void *pDest, const void *pSrc, long size )
{
   _asm
   {
      PUSH ESI
      PUSH EDI
      MOV ECX, [size]
      SHR ECX, 3 // div 8 - 2^3
      MOV ESI, [pSrc]
      MOV EDI, [pDest]
      LEA ESI, [ESI+ECX*8]
      LEA EDI, [EDI+ECX*8]
      NEG ECX

      loop1:
      MOVQ mm0, qword ptr [ESI+ECX*8] // 8 bytes per register
      MOVQ mm1, qword ptr [ESI+ECX*8+8]
      MOVQ mm2, qword ptr [ESI+ECX*8+16]
      MOVQ mm3, qword ptr [ESI+ECX*8+24]
      MOVQ mm4, qword ptr [ESI+ECX*8+32]
      MOVQ mm5, qword ptr [ESI+ECX*8+40]
      MOVQ mm6, qword ptr [ESI+ECX*8+48]
      MOVQ mm7, qword ptr [ESI+ECX*8+56]

      MOVNTQ qword ptr [EDI+ECX*8],    mm0
      MOVNTQ qword ptr [EDI+ECX*8+8],  mm1
      MOVNTQ qword ptr [EDI+ECX*8+16], mm2
      MOVNTQ qword ptr [EDI+ECX*8+24], mm3
      MOVNTQ qword ptr [EDI+ECX*8+32], mm4
      MOVNTQ qword ptr [EDI+ECX*8+40], mm5
      MOVNTQ qword ptr [EDI+ECX*8+48], mm6
      MOVNTQ qword ptr [EDI+ECX*8+56], mm7

      ADD ECX, 8
      JNZ loop1
      //SFENCE
      //EMMS
      POP EDI
      POP ESI
   }
}






Usage:

Code: [Select]


...
// Init a source and a dest array - you are free to fill it with data  ;)
int test1[ARRAY];
int test2[ARRAY];
...

...
// Check if the feature is present:
if ( _DoesCPUSupportMMX() == 1 )
{
   puts("MMX... OK");
   // do a mmx operation...
}

if ( _DoesCPUSupportSSE() == 1 )
{
   puts("SSE... OK");
   // do a sse operation...
}
...

...
// if you use MOVSD for double words you have to divide the number of
// elements in the array through the size of (int) because int is a word or
// two bytes

memcpy2( test1, test2, sizeof(test2) / sizeof(int) );
...

...
// _EndMMX will clear the fpu registers
memcpy3( test1, test2, sizeof(test2) );
_EndMMX;
...




Depending on the size of the buffers and the number of repeats and the type of optimizations i got totaly different results.

Offline Pelle

  • Administrator
  • Member
  • *****
  • Posts: 2169
    • http://www.smorgasbordet.com
MMX, 3D Now! & SSE
« Reply #1 on: December 23, 2005, 09:56:34 am »
Sounds interesting. I will try to look at it...

Pelle
/Pelle

kobold

  • Guest
MMX, 3D Now! & SSE
« Reply #2 on: December 24, 2005, 10:56:06 pm »
Here is an output of my try:

Code: [Select]
Building x87_math.obj.
*** Error: ml.exe -c -nologo -coff -W1 -Cu -Fo"E:\Programme\PellesC\Projects\x86mph\output\x87_math.obj" "E:\Programme\PellesC\Projects\x86mph\x87_math.asm"
*** Error: Das System kann die angegebene Datei nicht finden.  ("The system cannot find the specified file")
*** Error code: -1 ***
Done.


But at first i tried it with devcpp. I got some .o files and one .a file ("x86mph.a").

Offline Pelle

  • Administrator
  • Member
  • *****
  • Posts: 2169
    • http://www.smorgasbordet.com
MMX, 3D Now! & SSE
« Reply #3 on: December 25, 2005, 04:53:27 pm »
I have only looked at this very briefly, so far. It should be possible to download NASM from sourceforge, go to Project options -> Macros tab, and edit the AS macro (to something like C:\NASM\blah blah) - removing the default "ML.EXE".

I can import the project file into the IDE, and when I try building, ML complains about the first ASM file (the syntax). Maybe it works if NASM is installed (I havn't tried it)??

Pelle
/Pelle

kobold

  • Guest
MMX, 3D Now! & SSE
« Reply #4 on: December 25, 2005, 06:32:23 pm »
Tried the nasm:

Code: [Select]
Building x87_math.obj.
nasm: fatal: unrecognized debug format `oE:\Programme\PellesC\Projects\x86mph\output\x87_math.obj' for output format `bin'
type `nasm -h' for help
*** Error code: 1 ***
Done.


 :(


PS: With the ASFLAGS "-f win32 -y null" it looks much better.

Code: [Select]

Building x87_math.obj.

valid debug formats for 'win32' output format are ('*' denotes default):
  * null      Null debug format
Building Declarations.obj.

valid debug formats for 'win32' output format are ('*' denotes default):
  * null      Null debug format
Building amalloc.obj.
E:\Programme\PellesC\Projects\x86mph\amalloc.c(37): fatal error #1035: Could not find include file "../../Include/x86mph.h".
*** Error code: 1 ***
Done.


 :)

kobold

  • Guest
MMX, 3D Now! & SSE
« Reply #5 on: December 25, 2005, 07:02:23 pm »
Is it possible, that the MSVC import did not save the directory structure? I had to change some paths and copy two subdirectorys into the project folder, because he copied every file without subfolders into the project folder.

Progress:

Code: [Select]
Building x87_math.obj.

valid debug formats for 'win32' output format are ('*' denotes default):
  * null      Null debug format
Building Declarations.obj.

valid debug formats for 'win32' output format are ('*' denotes default):
  * null      Null debug format
Building amalloc.obj.
Use <stdlib.h> instead of non-standard <malloc.h>
Use <string.h> instead of non-standard <memory.h>
Building Memory.obj.
Use <stdlib.h> instead of non-standard <malloc.h>
Use <string.h> instead of non-standard <memory.h>
Building AMD_Geom.obj.

valid debug formats for 'win32' output format are ('*' denotes default):
  * null      Null debug format
Building SSE_Fog.obj.

valid debug formats for 'win32' output format are ('*' denotes default):
  * null      Null debug format
Building SSE_Geom.obj.

valid debug formats for 'win32' output format are ('*' denotes default):
  * null      Null debug format
Building SSE_Lights.obj.

valid debug formats for 'win32' output format are ('*' denotes default):
  * null      Null debug format
Building AMD_Matrix.obj.

valid debug formats for 'win32' output format are ('*' denotes default):
  * null      Null debug format
Building SSE_Matrix.obj.

valid debug formats for 'win32' output format are ('*' denotes default):
  * null      Null debug format
Building SSE_Vertex.obj.

valid debug formats for 'win32' output format are ('*' denotes default):
  * null      Null debug format
Building Matrix.obj.
Use <stdlib.h> instead of non-standard <malloc.h>
Use <string.h> instead of non-standard <memory.h>
Building Vertex.obj.
Use <stdlib.h> instead of non-standard <malloc.h>
Use <string.h> instead of non-standard <memory.h>
Building Fog.obj.
Use <stdlib.h> instead of non-standard <malloc.h>
Use <string.h> instead of non-standard <memory.h>
Building Geom.obj.
Use <stdlib.h> instead of non-standard <malloc.h>
Use <string.h> instead of non-standard <memory.h>
Building Lights.obj.
Use <stdlib.h> instead of non-standard <malloc.h>
Use <string.h> instead of non-standard <memory.h>
E:\Programme\PellesC\Projects\x86mph\Lights.c(63): warning #2027: Missing prototype for 'CreateLight'.
E:\Programme\PellesC\Projects\x86mph\Lights.c(90): warning #2154: Unreachable code.
E:\Programme\PellesC\Projects\x86mph\Lights.c(142): warning #2027: Missing prototype for 'Shutdown_xLight'.
E:\Programme\PellesC\Projects\x86mph\Lights.c(194): warning #2027: Missing prototype for 'PreProcess_Lights'.
E:\Programme\PellesC\Projects\x86mph\Lights.c(254): warning #2027: Missing prototype for 'Kill_PreProcess_Lights'.
E:\Programme\PellesC\Projects\x86mph\Lights.c(460): warning #2130: Result of unsigned comparison is constant.
E:\Programme\PellesC\Projects\x86mph\Lights.c(462): warning #2130: Result of unsigned comparison is constant.
E:\Programme\PellesC\Projects\x86mph\Lights.c(463): warning #2130: Result of unsigned comparison is constant.
E:\Programme\PellesC\Projects\x86mph\Lights.c(464): warning #2130: Result of unsigned comparison is constant.
Building TnL.obj.
Use <stdlib.h> instead of non-standard <malloc.h>
Use <string.h> instead of non-standard <memory.h>
Building 3DNow.obj.

valid debug formats for 'win32' output format are ('*' denotes default):
  * null      Null debug format
Building ASM.obj.

valid debug formats for 'win32' output format are ('*' denotes default):
  * null      Null debug format
Building CPUINFO_ASM.obj.

valid debug formats for 'win32' output format are ('*' denotes default):
  * null      Null debug format
Building FPU.obj.

valid debug formats for 'win32' output format are ('*' denotes default):
  * null      Null debug format
Building MMX.obj.

valid debug formats for 'win32' output format are ('*' denotes default):
  * null      Null debug format
Building SSE.obj.

valid debug formats for 'win32' output format are ('*' denotes default):
  * null      Null debug format
Building CPUINFO.obj.
Use <stdlib.h> instead of non-standard <malloc.h>
Use <string.h> instead of non-standard <memory.h>
E:\Programme\PellesC\Projects\x86mph\CPUINFO.c(56): warning #2027: Missing prototype for 'SSE_Support'.
E:\Programme\PellesC\Projects\x86mph\CPUINFO.c(95): warning #2027: Missing prototype for 'SSE2_Support'.
Building FPU_C.obj.
Use <stdlib.h> instead of non-standard <malloc.h>
Use <string.h> instead of non-standard <memory.h>
Building SSE_C.obj.
Use <stdlib.h> instead of non-standard <malloc.h>
Use <string.h> instead of non-standard <memory.h>
Building MMX_Image.obj.

valid debug formats for 'win32' output format are ('*' denotes default):
  * null      Null debug format
Building MMX_NGineZ.obj.

valid debug formats for 'win32' output format are ('*' denotes default):
  * null      Null debug format
Building Image.obj.
Use <stdlib.h> instead of non-standard <malloc.h>
Use <string.h> instead of non-standard <memory.h>
Building x86mph.obj.
Use <stdlib.h> instead of non-standard <malloc.h>
Use <string.h> instead of non-standard <memory.h>
Building x86mph.lib.
POLIB: fatal error: File not found: 'E:\Programme\PellesC\Projects\x86mph\output\x87_math.obj'.
*** Error code: 1 ***
Done.


PS: Is ther a way to determine the type of compiler with the preprocessor (#ifdef)?

Offline Pelle

  • Administrator
  • Member
  • *****
  • Posts: 2169
    • http://www.smorgasbordet.com
MMX, 3D Now! & SSE
« Reply #6 on: December 26, 2005, 07:12:02 am »
Quote from: "kobold"
Is it possible, that the MSVC import did not save the directory structure?

Yes, I think the importer can have some troubles here. It will not handle all cases well. Maybe I can improve it in the future...

Quote from: "kobold"

PS: Is ther a way to determine the type of compiler with the preprocessor (#ifdef)?

For Pelles C you can use (two underscores before and after POCC):
Code: [Select]

#ifdef __POCC__


Pelle
/Pelle