Lab_Interning: String Interning Implementation
J. Nelson Amaral
Motivation
298 no-space characters
199 no-space characters
33% reduction in size
Hashing Algorithm
MinutesPerHour
ASCII Codes: 77
Hashing Algorithm
ASCII Codes:
MinutesPerHour
Hashing Algorithm
ASCII Codes:
MinutesPerHour
77 105 110
Hashing Algorithm
ASCII Codes:
MinutesPerHour
77 105 110 117 …
CS Help, Email: tutorcs@163.com
String Interning Table
Each one of these entries in the table is called a bucket
Create immutable copy Place address of immutable copy into the table entry.
Return unique ID
000 … 000
程序代写 CS代考 加微信: cstutorcs
String Interning Table
In this slides we will refer to the unbounded data structure used for the overflowing of a bucket as a tank
Search tank for string.
If not found:
return unique ID.
0x078E 7400 0x078E 7440
0x078E 7348
0x0808 400C
0bbbb…bb is the address of an interned string
create immutable copy insert string into tank. return unique id
1bbbb…bb
String Interning Table
Compare input with interned string.
If it matches:
• return unique id
If it does not match:
0bbbb…bb is the address of an interned string
create a new tank.
add interned string address create immutable copy add copy address
replace bucket content return unique ID
0xhhhh hhhh 0xnnnn nnnn
01bcbcbccb…cbcb
Why this works?
0xfFFF FFFF
0x8000 0000 0x7FFF FFFF
0x0000 0000
Kernel Space
internString a0: address of a string to be interned
Parameter: Return Value:
a0: unique ID for the string
Strings with same value must always return the same ID
independent of their memory address.
Must create immutable copy if string was not interned before.
getInternedString a0: unique interned string ID
Parameter: Return Value:
a0: string address if string was interned before. a0: 0 if string was not interned before.
internFile a0: pointer to a file in mutable memory
Parameter: Return Value:
a0: address of list of interned strings a1: number of identifiers in the list
strings are separated by one or more spaces or line feed characters end of file is signaled by an End of Transmission (EOT) character must make immutable copies of each string.
Your implementation will be presented with at most 128 different strings.
Computer Science Tutoring