I've only written up a few rather serious programs with Pelles C. First one is a team milestone that tracks all of our active folders on my folding team, from two raw data file of > 300,000 folders. It parses them out, sorts them two different ways (by scores and by names), and outputs reports using the assigned color codes, font size, etc., for the folding forum. What I like best about it, is that it's able to complete the run, in about 8 seconds.
The second one was a test comparing an indexed search, to a binary search, in matching words from a large amount of text (a novel), to a pre-loaded array of words (there's those strings again!). I did this same test about 7 years ago, using Turbo C's compiler, with a C2D (E6700 cpu), based system. It was a big win for the indexed search. This time, using an i7 (940) system, it was a virtual tie. Both searches could locate each word, search for it, and finish up the entire novel (A Tale of Two Cities, in this case), in a matter of a few seconds.
The third one was a puzzler - written as a challenge to this on-line code challenge website problem:
http://www.spoj.pl/problems/TDKPRIME/Which seems like a rather easy problem, but NOT SO FAST! The testing rig is an M based cpu server, running at 850MHz, using a different compiler - of course.
So I put together some code, and it failed to meet the 10 second time limit - although it finished in under 3 seconds on my system, using Pelles C. Finally, someone wrote up another way to do it, and when I tested it - it did quite well:
finishing in less than 8 seconds. I ran this same program using Pelles C to compile it, and it took >14 seconds to finish, on an i7 system @ 3.5GHz! That was shocking!
Obviously, there is some optimization that Pelles (or I) have missed, in this program, and because the program is so repetitive in nature, it's magnified many times over, throughout the program.
Imagine - a Pentium M at 850MHz server, runs the program in 7.6 seconds, and an i7 at 3.5GHz runs it in > 14 seconds?? My own program runs in < 3 seconds on the same i7 system, but runs out of time (> 10 seconds), on the 850 MHz server!!
The difference apparently is the compiler: Pelles C on my i7, and gcc on the Pentium M. I SO don't get this!!
Any idea's why that is, and what can be done to speed it up in the Pelles compiler?
This is the program:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <math.h>
#include <limits.h>
#include <ctype.h>
/* Pack sieve array. Don't encode even numbers, and start the array
* with 3, the first non-even prime. If you count 1 as prime, it can
* be added as a special case like 2 below so it doesn't mess up the
* mapping.
* Next step is to pack from 1 character per prime to 1 bit per prime.
* Since longs are 32 bits, divide again by 32 to get the word index and mask
* off the low 5 bits (0-31) to get the amount to shift to find the bit
* which corresponds to a given prime.
*/
#define WORD_FROM_VAL(x) (((x) - 3)/ 64)
#define BIT_FROM_VAL(x) ((((x) -3)/ 2) & 0x1f)
#define PRIME_5_MILLION 86028121
int arr[5000000];
int size=PRIME_5_MILLION/(2 * sizeof(unsigned long) * CHAR_BIT);
int main() {
double start,end;
int q;
start=clock();
int i,j,m=0;
int sieve_end = floor(sqrt(PRIME_5_MILLION)+0.1);
unsigned long *sieve = calloc(size, sizeof(unsigned long));
/* Up to 50,000 numbers to test, each up to 7 digits plus \r\n (or \n depending
* on OS, but don't worry too much about an extra 50KB) */
char *input = malloc (50001 * 9 + 1);
char *output = malloc (50000 * 10 + 1);
int curr_prime;
for (curr_prime = 3; curr_prime <= sieve_end; curr_prime += 2)
{
if (!(sieve[WORD_FROM_VAL(curr_prime)] & (1UL << BIT_FROM_VAL(curr_prime))))
{
for(j = curr_prime*curr_prime; j < PRIME_5_MILLION; j+=curr_prime*2)
{
sieve[WORD_FROM_VAL(j)] |= 1UL << BIT_FROM_VAL(j);
}
}
}
//end=clock();
//printf("\n%f",(end-start)/CLOCKS_PER_SEC);
//start=clock();
arr[m++] = 2; /* Special case for 2 */
for (i=3; i<=PRIME_5_MILLION; i+=2) {
if (!(sieve[WORD_FROM_VAL(i)] & (1UL << BIT_FROM_VAL(i))))
arr[m++]=i;
}
free(sieve);
//end=clock();
//printf("\n%f\n",(end-start)/CLOCKS_PER_SEC);
int rc = fread(input, 1, 50001*9+1, stdin);
input[rc] = '\0';
const char *inp = input;
char *outp = output - 1;
char *outp_saved = output;
unsigned in_num;
unsigned out_num;
unsigned out_num2;
/* Skip first number */
while (!isdigit(*inp)) inp++;
while ( isdigit(*inp)) inp++;
while (*inp)
{
/* Skip CR-LF */
while (*inp && !isdigit(*inp)) inp++;
/* Convert string to unsigned int */
for (in_num = 0;*inp && isdigit(*inp); inp += 1)
{
in_num *= 10;
in_num += *inp - '0';
}
if (in_num)
{
/* Turn prime count into prime value */
out_num = arr[in_num-1];
/* Move output pointer ahead, print in reverse */
for (out_num2 = out_num; out_num2; outp += 1)
out_num2 /= 10;
/* At end of space for number, add CR-LF pair */
outp_saved = outp+1;
*outp_saved = '\n';
/* Print digits in reverse order from least to most significant
* so when they're read normally order it is correct */
for ( ; out_num; out_num /= 10)
*outp-- = (out_num % 10) + '0';
outp = outp_saved;
}
}
fwrite(output, 1, outp - output + 1, stdout);
return 0;
}
The data.txt for it (it's a console program, requiring a redirected data.txt filename for the kth prime requests), can be downloaded from here:
http://www.swoopshare.com/file/4b04eb22b8f708f1ee3f7a8af63a32a3/data.txt.htmlwith no need to sign up.
I'm using Pelles 6.50.8 rc #4 x64, and this was a release build with no debug info and maximize speed set.
The test result is pictured below. I'm Dave (although this program was not written by me, it had to be submitted under my name).