## Searching techniques: Sequential search: O(n)
- Navigate this page:
- General characteristics of a hashing function
- Hash codes for primitive types
- Hashing method: Digit Analysis/Extraction
- Hashing Method: Mid-Square Method
- Converting a Hash Code into an Index for the Hash Table
- Open Addressing with a Linear probe
- Example: Adding and Retrieving (from Carrano)
- A Potential Problem with Open Addressing
The last row is where it differs from the inclusive or. ## Hashing method: Digit Analysis/ExtractionTABLE of frequency of occurrence of the digits 0..9 for 2800 five-digit keys: digit: 1 2 3 4 5 0 2026 250 218 1012 260 1 618 395 391 185 382 2 128 263 389 299 271 3 23 298 330 52 302 4 5 298 330 52 302 5 335 299 101 387 6 303 339 18 199 7 289 308 124 301 8 267 267 999 245 9 400 259 0 353 ## Analysis:"By this method a frequency count is performed in regard to the number of times each of the 10 digits occurs in each of the positions included in the record key. For example, consider the following table showing the number of times each digit occurred in a five-position numeric key for 2,800 records. In this tabulation we can observe that digits 0..9 occur with approximately uniform distribution in key positions 2, 3, and 5; therefore, if a 3-digit address were required, the digits in these three positions in the record keys could be used. Given that there are 2,800 records, however, a four-digit address would be required. Suppose we desire the first digit to be a 0, 1, 2, or 3 only. Such assignment can be made with about equal frequency for each digit by using a rule such as the following: assign a 0 when digits in positions 2 and 3 both contain odd numbers, a 1 if position 2 is odd and position 3 is even, a 2 if position 2 is even and position 3 is odd, or a 3 if positions 2 and 3 both contain even numbers. Thus, the address for key 16258 would be 3628: the 3 from the fact that positions 2 and 3 both contain even numbers and the 628 from key positions 2, 3, and 5. Other rules for prefixing additional digits can be formulated for different circumstances. In any event, the digit analysis method relies on the digits in some of the key positions being approximately equally distributed. If such is not the case, the method cannot be used with good results. " (Philippakis) ## Hashing Method: Mid-Square MethodThe record key is multiplied by itself, and the product is truncated from both left and right so as to form a number equal to the desired address length. Thus, key 36,258 would be squared to give 1,314,642,564. To form a four-digit address, this number would be truncated from both the left and the right, resulting in the address 4642. ## Converting a Hash Code into an Index for the Hash TableThe most common way to convert a hash code into an index into a hash table is to divide the hash code by the size of the hash table and take the remainder: index = key % tableSize; ## Resolving collisionsThe problem with hashing is that we are mapping a LARGE key space into a SMALL address space. Whenever we do this, we are bound to have COLLISIONS (sometimes called clashes). When a collision occurs, we have a problem. We still have to put the item into the table (or dictionary), but we can't put it into the location that it hashed to because it's already occupied. So the challenge is to figure out where to put it. COLLISION/CLASH: when 2 records hash into the same address. So, how do you resolve the problem??? You need to make sure your table is big enough for all of the data that you will want to put into it. That means that you must have some idea of how much data you will be storing. If we leave more than enough space, all we have to do is find an extra space. One solution to this problem is a ## Open Addressing with a Linear probeLINEAR PROBE: a search of subsequent memory locations until an open slot is found.
A linear probe simply looks for the next available location. If the key hashes into location Note that if we are at the end of the array, we must "wrap around" to position 0 and continue from there. That is, we treat the array as if it were circular. For a linear probe to work, at creation time, mark each hash table element as being unused (e.g. by putting the value Now assume that two values hash into the same location. The first one will get to occupy that location. The second one, however, must look at the next few (hopefully) places and find an unoccupied place.
## Example: Adding and Retrieving (from Carrano)Assume that all 4 of the following data/key pairs hash to the same location (52): -
"555-1214", "150 Main Street" -
"555-8132", "75 Center Court" -
"555-4294", "205 Ocean Road" -
"555-2072", "82 Campus Way"
We will put "150 Main Street" into location 52. We will put "75 Center Court" into location 53 (if it is available). We will put "205 Ocean Road" into location 54 (if it is available). We will put "82 Campus Way" into location 55 (if it is available). Now, consider what happens when we want to retrieve the data associated with phone number 555-2072. Our hash function will take us to location 52! How do we know that this is the correct data? The only way that you can know that you have retrieved the correct data is if you also store the key with the data!
## Example: DeletingSuppose we remove the objects in locations 53 and 54 by replacing their values with null. Now when we go to retrieve the value for the key "555-2072", if we stop when we find a null value, we won't find it!
There are -
Spaces that have never been occupied (and should end a search), and -
Spaces that have been occupied (and should NOT end a search).
Therefore, when we remove an item from a hash table, we should NOT replace it with null, but we should mark it in some way as being available.
## ClusteringWhen you add records and a lot of them end up in the same part of the table, this can severely slow down your searching. You will end up with areas of your table that have clusters of filled entries, and other areas of your table that have very few entries. This is called clustering.
A way to scatter out the records that hash into the same location is to use a ## A Potential Problem with Open AddressingIt is possible with the methods that we just described that, after many additions and deletions, ALL of the records in a table will be marked either as occupied or available and that no table entries are marked as empty, or null. Then, if we have a clash and have to search for another location to put the data, we may actually end up searching the entire table! This is not good – it will be very slow.
## Separate ChainingAnother alternative is to allow more than one data item to go in a single table entry. When you do this, each location in the table is called a bucket and each table entry points to a linked list of data items that all hashed into this spot.
## The Load FactorOur first example used a perfect hashing function – there were never any collisions, and never any unused table locations. In real life, this never happens. We will have collisions, but we want to minimize the number of collisions that will occur. There are several things we can do to make this more likely: -
Make the table larger than the number of keys (so there will ALWAYS be unused locations). -
Try to develop a hashing function that will truly evenly distribute the keys over the hash table. -
Use a prime number for the size of the hash table.
## Hash Table SizeHow big should you make the hash table? If you use If you use It is also recommended that the size of the hash table be a Advantage of Hashing: Lookup is fast.
Disadvantage of Hashing: The data cannot be retrieved in sorted order.
Download 35.19 Kb. Share with your friends: |