NO

Author Topic: TikToken  (Read 97 times)

Offline HellOfMice

  • Member
  • *
  • Posts: 65
  • Never be pleased, always improve
TikToken
« on: May 13, 2024, 06:34:21 AM »
Hello,


Is it possible to interface TikToken with Pelle's C. I have searched but did not find anything.
TikToken is written in python.
In the case it is possible, how to do it, please.


Thank you for your help
--------------------------------
Kenavo

Offline frankie

  • Global Moderator
  • Member
  • *****
  • Posts: 2098
Re: TikToken
« Reply #1 on: May 14, 2024, 12:27:12 PM »
It is better to be hated for what you are than to be loved for what you are not. - Andre Gide

Offline Vortex

  • Member
  • *
  • Posts: 804
    • http://www.vortex.masmcode.com
Re: TikToken
« Reply #2 on: May 14, 2024, 10:11:23 PM »
Hello,

Are you referring to this project?

https://github.com/openai/tiktoken
Code it... That's all...

Offline HellOfMice

  • Member
  • *
  • Posts: 65
  • Never be pleased, always improve
Re: TikToken
« Reply #3 on: Yesterday at 12:36:27 PM »
Yes. It computes the tokens.
--------------------------------
Kenavo

Offline HellOfMice

  • Member
  • *
  • Posts: 65
  • Never be pleased, always improve
Re: TikToken
« Reply #4 on: Yesterday at 12:48:26 PM »
For OpenAI a token is a group of three or four characters. The solution I have made is to divide the length of each word by three and add 1 word length is greather than three.


OpenAI tokenizer https://platform.openai.com/tokenizer

--------------------------------
Kenavo

Offline WiiLF23

  • Member
  • *
  • Posts: 71
Re: TikToken
« Reply #5 on: Yesterday at 09:52:22 PM »
I would love to convert this, just to stick it to Python.

I’m not a fan of it, however given the use of vectors and a range of “modules”, I would just grab the bindings and cave in.

A pure rewrite would utilize AVX/AVX2 or the SSE instructions (with CPU vendor detection of course). So that alone is worth considering if desiring a scratch implementation in C. Pelles has vector support, you will find this in the project settings.

Basically, you would need the API documentation and the rest is up to the C programming to align with the OpenAI API documentation.

It looks like some work outside of the Python C bindings.
« Last Edit: Yesterday at 09:54:23 PM by WiiLF23 »