aMule Bug Tracker - aMule
View Issue Details
0000885aMuleFeature Requestpublic2006-05-06 19:362008-07-09 16:13
pcmaster 
 
normalfeaturealways
closedopen 
 
 
Any
0000885: Sorting Unicode Characters
I'm writes a function that is capable of ordering words in alphabetical order. It works with both ANSI and UFT-8 encodings.

Is not exhaustively proved but it seems to work. It's just an old library wruited by me in Turbo Pascal years ago in order to sort ANSI and ASCII characters, without ASCII support and with UTF support added.
You just have to use the AlfaComp function in order to compare UTF or ANSI strings, in this form:

int AlfaComp (char *a, char *b, char juego, char func)


(s and t are char *, "juego" can be "UTF8" or "ANSI" that are constants defined in the library to indicate the encoding in use, and there are three functions available).

Examples:

AlfaComp (s,t,UTF8,MAYOR_QUE)
returns 1 if s has to be ordered after t, and 0 if not.

AlfaComp (s,t,ANSI,MENOR_QUE)
returns 1 if s has to be ordered before t, and 0 if not.

AlfaComp (s,t,UTF8,IGUAL_QUE) returns 1 is both string are equal (the program supposes the letters "á" and "a" are same. If you don't like this, just use the standard strcmp function in string.h

Note: The letter ç is an special case. It just can be ordered as a c, but if two words only differs in this letter, the ç can be ordered AFTER the c. because of this, this words are correctly in order:

placa
plaça
plaçada
placar
plaçar
placard

The function takes this in consideration and returns the correct value. Note: this is using the MAYOR_QUE and the MENOR_QUE functions, the IGUAL_QUE function always returns thesse two words are different.

Other interesting functions in the library:

void QuitarAcentos (char *s, char *e, char juego)

copy the e string in the s string with substitution of accentued letters. For example, á, and à can be changed by an a. juego can be UTF8 or ANSI

void Mayuscula (char *s, char juego)
converts a string first letter to uppercase

void Minuscula (char *s, char juego)
converts a string first letter to lowercase

void MaysCadena (char *s, char juego)
converts an entire string to uppercase

void MinsCadena (char *s, char juego)
converts an entire string to lowercase

char MeteChar (char *s, int c, char juego)
inserts the c character (c is an int containing the chacacter code) in the string s, and returns the number of bytes used by the character in the string (always 1 for ANSI, 1 to 4 for UTF-8).

int utf24bit (char *s, char juego)
reads a character from s and returns it as a int. s can be a 1 byte ANSI character or a 1 to 4 byte UTF-8 character.

int ValidaUTF (char *s)
Tests if s can be a valid UTF-8 sequence of bytes. (For example, an UTF string can not have two bytes >192 one just after the other.

char NumBytes (char *s, char juego)
reads the first byte of a character and returns number of bytes it has to have if it is a valid UTF-8 sequence. For example, if an utf character has the first byte > 240 the next three bytes can be 128-191.

size_t UTF8long (char *s, char juego)
returns the number of letters in a UTF-8 or ANSI string.

The library also defines constant names for characters >128.

Note: is my first useful C program. If you find any bug, please send me a message. Thanks.
No tags attached.
c alfaint.c (9,375) 2006-05-06 19:37
https://bugs.amule.org/file_download.php?file_id=149&type=bug
c ejemplo.c (1,910) 2006-05-06 19:39
https://bugs.amule.org/file_download.php?file_id=150&type=bug
txt gples.txt (22,474) 2006-05-06 20:13
https://bugs.amule.org/file_download.php?file_id=151&type=bug
c alfaint.c (9,321) 2006-05-06 20:49
https://bugs.amule.org/file_download.php?file_id=152&type=bug
c alphaint.c (8,725) 2006-05-08 15:24
https://bugs.amule.org/file_download.php?file_id=153&type=bug
c test.c (2,056) 2006-05-08 15:24
https://bugs.amule.org/file_download.php?file_id=154&type=bug
txt gpl.txt (19,941) 2006-05-08 15:30
https://bugs.amule.org/file_download.php?file_id=155&type=bug
c alphaint.c (6,039) 2006-11-06 21:36
https://bugs.amule.org/file_download.php?file_id=169&type=bug
? alphaint.h (4,264) 2006-11-06 21:37
https://bugs.amule.org/file_download.php?file_id=170&type=bug
c test.c (2,346) 2006-11-06 21:38
https://bugs.amule.org/file_download.php?file_id=171&type=bug
? compile.sh (213) 2006-11-06 21:39
https://bugs.amule.org/file_download.php?file_id=172&type=bug
Issue History
2006-05-06 19:36pcmasterNew Issue
2006-05-06 19:36pcmasterOperating System => Any
2006-05-06 19:37pcmasterFile Added: alfaint.c
2006-05-06 19:39pcmasterFile Added: ejemplo.c
2006-05-06 19:39pcmasterNote Added: 0001972
2006-05-06 20:13pcmasterFile Added: gples.txt
2006-05-06 20:49pcmasterFile Added: alfaint.c
2006-05-06 20:50pcmasterNote Added: 0001973
2006-05-07 22:38KryNote Added: 0001974
2006-05-08 00:19pcmasterNote Added: 0001975
2006-05-08 15:21pcmasterNote Added: 0001976
2006-05-08 15:24pcmasterFile Added: alphaint.c
2006-05-08 15:24pcmasterFile Added: test.c
2006-05-08 15:25pcmasterNote Edited: 0001976
2006-05-08 15:30pcmasterFile Added: gpl.txt
2006-11-06 21:36pcmasterFile Added: alphaint.c
2006-11-06 21:37pcmasterFile Added: alphaint.h
2006-11-06 21:38pcmasterFile Added: test.c
2006-11-06 21:39pcmasterFile Added: compile.sh
2006-11-06 21:40pcmasterNote Added: 0002148
2008-07-09 16:13WuischkeStatusnew => closed

Notes
(0001972)
pcmaster   
2006-05-06 19:39   
See bug 472 :)
(0001973)
pcmaster   
2006-05-06 20:50   
Updated alfaint.c. Deleted unnecesary ; characters and using constants in one function.
(0001974)
Kry   
2006-05-07 22:38   
If you want it to be included in aMule distribution so it can be used, you gotta change all that spanish words to english :)
(0001975)
pcmaster   
2006-05-08 00:19   
Translating...
(0001976)
pcmaster   
2006-05-08 15:21   
(edited on: 2006-05-08 15:25)
Translation finished :)

Function names changed:

AlfaComp -> int AlphaComp (char *a, char *b, char encoding, char cond)

QuitarAcentos -> void EraseAccents (char *s, char *e, char encoding)

Mayuscula -> void UpperCase (char *s, char encoding)

Minuscula -> void LowerCase (char *s, char encoding)

MaysCadena -> void StringToUpper (char *s, char encoding)

MinsCadena -> void StringToLower (char *s, char encoding)

metechar-> char PutChar (char *s, int c, char encoding)

ValidaUTF -> int UTFvalid (char *s)

and a few number of internal functios.

Also constant names are changed, because this names are mnemonics for the letter and the symbol in it: accent, tilde, etc. And the filenames are also changed.

edited on: 05-08-06 15:25
(0002148)
pcmaster   
2006-11-06 21:40   
Update: Bugs in EQUAL_TO function in alphaint.c and in test.c corrected. Added a header file and a compile script to test the program.