NO

Author Topic: frequency of ints in huge array  (Read 33086 times)

Offline jj2007

  • Member
  • *
  • Posts: 536
Re: frequency of ints in huge array
« Reply #15 on: July 04, 2014, 06:18:06 PM »
This code takes just over 4 secs here.

Time the qsort separately. It will be just under 4 secs ;-)

Quote
If it's no good I'll stop. :)

It seems you are doing exactly what I did in my tests with assembler. So don't stop - unless we see something faster, it seems the way to go ;-)
« Last Edit: July 05, 2014, 03:03:45 AM by jj2007 »

Offline Bitbeisser

  • Global Moderator
  • Member
  • *****
  • Posts: 772
Re: frequency of ints in huge array
« Reply #16 on: July 05, 2014, 03:41:27 AM »
I would suggest just start counting from the top down, ...
Do you mean something like that?
Code: [Select]
maxcount = 0;
for (i=0; i<MAX; i++) {
  count = 0;
  for (k=i; k<MAX; k++)
    if (a[k] == a[i]) count++;
  if (count > maxcount) {
    maxcount = count;
    index = i;
  }
}
This is very slow for huge arrays! O(n²)
May be, that this can be optimized. The search can be terminated if maxcount > (MAX-i). But this depends on the data.
That's what I mentioned in my original reply. It highly depends on the spread of the values within the array.
I have done something similar years back, just need to dig it out, but I am not frequently at a computer, there's a long weekend here in the US of A and as I have no vacation time being self-employed, it's as close as any "time off" will be for a while...

Ralf

JohnF

  • Guest
Re: frequency of ints in huge array
« Reply #17 on: July 05, 2014, 08:36:51 AM »
This code takes just over 4 secs here.

Time the qsort separately. It will be just under 4 secs ;-)

Quote
If it's no good I'll stop. :)

It seems you are doing exactly what I did in my tests with assembler. So don't stop - unless we see something faster, it seems the way to go ;-)

I've tried two radix sorts and they were slower - maybe they were not great implementations, I don't know.

EDIT:

Changing the compare function slightly

Code: [Select]
int __cdecl CompareInt(const void * param1, const void * param2)
{
if(*(unsigned long int *)param1 < *(unsigned long int *)param2)
return 1;
else if(*(unsigned long int *)param1 > *(unsigned long int *)param2)
return -1;
else
return 0;
}

and setting maximize speed, the time comes down to 3.1 secs.
 
John
« Last Edit: July 05, 2014, 09:15:15 AM by JohnF »

Offline jj2007

  • Member
  • *
  • Posts: 536
Re: frequency of ints in huge array
« Reply #18 on: July 05, 2014, 11:17:15 AM »
... setting maximize speed, the time comes down to 3.1 secs.

Sounds good. Which CPU? Mine is a Celeron M.

JohnF

  • Guest
Re: frequency of ints in huge array
« Reply #19 on: July 05, 2014, 11:26:59 AM »
... setting maximize speed, the time comes down to 3.1 secs.

Sounds good. Which CPU? Mine is a Celeron M.

Pentium(R) Dual-Core 2.7 GHz

John

czerny

  • Guest
Re: frequency of ints in huge array
« Reply #20 on: July 05, 2014, 01:43:47 PM »
If it's no good I'll stop. :)
What do you want to stop? Stop breathing? Stop coding? Stop posting in the Pelles C forum?
Don't do it, please! ;)

czerny

  • Guest
Re: frequency of ints in huge array
« Reply #21 on: July 05, 2014, 02:06:46 PM »
I would propose a different approach that is very efficient if the distribution is narrow (we have many ripetitions)
Try this (there are many (10000000 x 17)):
Code: [Select]
for (int i=0; i<sizeof(array)/sizeof(DWORD); i++)
  if (i%2) array[i] = 17; else array[i] = i;
« Last Edit: July 05, 2014, 02:46:51 PM by czerny »

Offline jj2007

  • Member
  • *
  • Posts: 536
Re: frequency of ints in huge array
« Reply #22 on: July 05, 2014, 02:46:40 PM »

JohnF

  • Guest
Re: frequency of ints in huge array
« Reply #23 on: July 05, 2014, 04:31:01 PM »

JohnF

  • Guest
Re: frequency of ints in huge array
« Reply #24 on: July 05, 2014, 04:43:29 PM »
Czerny, if you can get Pelle's OpenMP stuff working with his quick_sort I believe it would speed up the sorting quite a bit.

However, I've not been able get it working with unsigned int's.

http://www.smorgasbordet.com/pellesc/zip/qsort.zip

The fastest is SORT_METHOD 3

In his example you can sort 20000000 floats in 1.6 secs.

John
« Last Edit: July 05, 2014, 06:16:50 PM by JohnF »

Offline jj2007

  • Member
  • *
  • Posts: 536
Re: frequency of ints in huge array
« Reply #25 on: July 05, 2014, 05:01:34 PM »
If you are not too disgusted by assembler, try the attachment. It runs either on the commandline argument, or (if there are no args) on a file called FatArray.dat

JohnF

  • Guest
Re: frequency of ints in huge array
« Reply #26 on: July 05, 2014, 05:40:12 PM »
That's certainly fast, 1.6 secs in total here.

You could make a library that can be called from C?

All your best stuff in 32 and 64 bit libs. :)

John

« Last Edit: July 05, 2014, 08:14:54 PM by JohnF »

czerny

  • Guest
Re: frequency of ints in huge array
« Reply #27 on: July 05, 2014, 07:00:20 PM »
If you are not too disgusted by assembler, try the attachment. It runs either on the commandline argument, or (if there are no args) on a file called FatArray.dat
Hmm!

On my old machine I got for copying + sorting + finding:

jj2700  :  2843 ms
john     :  7780 ms (with qsort)
MySort : 37326 ms

I wonder why MySort is soooo slow!

Code: [Select]
void MySort(unsigned long arr[], unsigned long l, unsigned long r)
{
if (l<r) {
unsigned long h, k, last, first = l+1, p = arr[l];

last = l;
for (k=l+1; k<=r; k++) {
if (arr[k] < p) {
h = arr[k]; arr[k] = arr[first]; arr[first] = h;
last = first++;
}
}
if (last > l) {
arr[l] = arr[last]; arr[last] = p;
}

if (last > l) MySort(arr, l, last-1);
if (last < r) MySort(arr, last+1, r);
}
}

jj2700: Your code for sorting is not included. I have downloaded SetupMasmBasic and executed SetupMasmBasic.exe but I got only an empty dialog with 'last chance ...' as title and 5 files in a temp directory.

I have found a nonrecursiv asm version on the net: http://www.cs.hofstra.edu/~cscccl/csc111/qsort.asm

I would like to learn, howto make this Pelles C compatible. This should be easy in that case because der is a wrapper function for C.

Edit: Ok, I have changed .386p to .386 and deleted the .stack directive (don't know if this was right?)!

But it runs: 4587 ms

« Last Edit: July 05, 2014, 07:46:20 PM by czerny »

czerny

  • Guest
Re: frequency of ints in huge array
« Reply #28 on: July 05, 2014, 07:03:16 PM »
Czerny, if you can get Pelle's OpenMP stuff working with his quick_sort I believe it would speed up the sorting quite a bit.
I neither own a machine with some sort of multi core processor nor have I an operating system at home to run the newest Pelles C.  :'(

Offline frankie

  • Global Moderator
  • Member
  • *****
  • Posts: 2096
Re: frequency of ints in huge array
« Reply #29 on: July 05, 2014, 08:33:03 PM »
I would propose a different approach that is very efficient if the distribution is narrow (we have many ripetitions)
Try this (there are many (10000000 x 17)):
Code: [Select]
for (int i=0; i<sizeof(array)/sizeof(DWORD); i++)
  if (i%2) array[i] = 17; else array[i] = i;
Yes, but this is not a narrow distribution anyway, you have 10M values equal and 10M values different...   :-\
To speed up the calculation you have to sort the array, one of the fastest ways (if you don't hate the math) is to sample at powers of 2 steps as described here...  :P
P.S. if the time is relevant, and in your case with so large arrays of data it is the case, the suggestion from JJ to use assembler is appropriate.  ;)

EDIT: if you reduce to 20K different values it takes 350ms on my machine:
Code: [Select]
for (int i=0; i<sizeof(array)/sizeof(DWORD); i++)
  if (i%1000) array[i] = 17; else array[i] = i;
« Last Edit: July 05, 2014, 08:41:31 PM by frankie »
It is better to be hated for what you are than to be loved for what you are not. - Andre Gide