Thursday, April 15, 2010
-2o1o- ARP Poisoning
When i think of writing descriptive/procedural oriented articles like what I've done in the previous article, i cringe in horror; for a simple reason that they bore me easily when it comes to writing them. On the contrary, my blood pressure tends to mount upwards as an effect of the excitement I always get by just thinking how much fun and challenge I would get in the process of writing a conceptual based article like this. Note that this article is meant for educational purposes only, I am not responsible for any loss incurred or harm caused from the practice of the skills/techniques taught.
Imagine one day when you are seated in Starbucks with a cup of your favorite Espresso and laptop by your side, listening to a business plan (which you deem not viable) of a young local entrepreneur who decided to franchise his business overseas. Boredom strikes and you decided to logon to Friendster via the wireless network to perform some social networking activities, until he finally notices your show of disinterest and leaves in utter disappointment. The next moment some anonymous guy taps your shoulder from the back and says “Hey I have your Friendster account!”
Note : You might be wondering why i chose to use a less popular site Friendster over high popularity sites like Facebook, Hotmail, and Yahoo. This is because Friendster is by far the first well established site I’ve come across, without encryption implemented in its authentication system.
The Nature of Network
Before proceeding to the interesting part, let us first fortify our knowledge on how the network transmits packets and how the router route the packets from the source to the destination. For people who do not wish to read to understand the underlying concept, you are at liberty to skip reading everything until the section “Demonstration of ARP Poisoning Attack”.
Consider the following example, user A is a home computer user with three computers connected to a router (192.168.1.1/24). One day he picked up a web designing course and decided to make a homepage of himself and host it locally. When people ask user A for the address of his newly created homepage, he would give them his IP address or domain name and people would enter the IP address/domain name in their browser, and the browser would send a SYN packet to initiate the TCP 3-way handshake process.
Everything works smooth without flaw until this point, however problem arises when the packet reaches the user A's router. His router would not be able to reach the destination because the packet contains only WAN logical IP address for itself to reach the router; it does not tell where the router should send itself to. A simple workaround for this is to send the packet to all computers connected to the router, but this apparently is not the most ideal solution, as it floods the whole network with unnecessary traffic and eventually causes traffic congestion or even Denial of Service (DoS) attack within the network.
Port Forwarding
This is where the port forwarding makes its debut in this article. Port forwarding is a NAT extension whereby the packets are directed to the intended destination by referring to the port mapping table. Of course, before everything takes place, the user has to map the port to its own logical address in order for the router to understand where to direct the packets to. Continuing from the above example, in order for the router to know how to route traffics coming to port 80 to user A's computer, it's a must for user A to map the port to his computer's IP address. After user A has done with the mapping, the router would, in essence, forward every traffic coming to port 80 to his IP address, which we assume as192.168.1.13. This also explains why a router can only have one device mapped to one port. The router however, still does not know how to reach the recipient simply by just knowing the address.
Note: A router uses NAT (Network Address Translation) to translate addresses initiated locally whereas the NAT extension “port forwarding” is used to translate address initiated outside the local network. In this article I chose to use port forwarding to explain the example because I think readers would prefer to read port forwarding over NAT.
Address Resolution Protocol
Now please allow me to introduce you the ARP (Address Resolution Protocol). ARP is a protocol used to determine the physical address (also known as MAC address) when only the logical address(IP address) is known. When a device wishes to locate another device on the same network, it would send an ARP request to every device with the message "Who has this IP Address?” and it expects the correspondent device to send back an ARP reply with the message "I have this IP address and this is my MAC address". Continuing from the above example again, the router would send every devices on the local network, says "Who has the IP Address 192.168.1.13?" and the user A's computer would send back an ARP reply "I have the IP address 192.168.1.13 and my MAC address is 00-52-FF-07-82-93". The router then sends the packet to user A’s computer and the connection ends gracefully when both parties have finished sending data and decide to terminate the session.
ARP Poisoning
Finally it comes to the wonderful part, suppose that user B, who is someone with malicious intent, happens to be living near user A and has access to his wireless network. He connects to the network, tells the router by sending an ARP reply indicating that he is 192.168.1.13(user A's IP address) and does the same to user A, indicating that he is the router (192.168.1.1). Interestingly, the router unfortunately does not know how to verify the authenticity of the ARP reply. It would only blindly thinks that user B is 192.168.1.13 and forward every packet that which is intended for 192.168.1.13 to user B. Likewise, the user A would also think that user B is the router and it would also forward every outbound packet to user B. With the possession of these packets, user B can view the contents and even launch a Man in the Middle Attack (MITM Attack) against user A.
(Screenshot depicts the network activity when ARP spoofing/poisoning takes place where 192.168.123.145,192.168.123.151,192.168.123.254 are the victims and 192.168.123.123 with the MAC address 00:0c:29:bd:8c:3a is the attacker)
User B simply sends the original packets to user A and it would not raise suspicion for user A because he is essentially getting the data that he wants (unless it has been modified).
Demonstration of ARP Poisoning Attack
An ARP poisoning attack will be demonstrated below using Cain & Abel, which can be downloaded from http://www.oxid.it/cain.html
1. Activate the sniffer
2. Start ARP Poisoning
3. Click Sniffer tab and click the plus sign to scan for all MAC Addresses in the same subnet.
4. Click ARP tab and click the plus sign to add host to be ARP poisoned. Now we want to intercept the connection between the computer 192.168.123.151 and the router, which is 192.168.123.254. So we would select 192.168.123.151 and 192.168.123.254 from the list.
you should get something similar to the screenshot below.
5. On another computer, I innocently enter my username and password and log into the Friendster website.
6. Click the passwords tab and select HTML, and there you see the username and password. taykahhoe89@hotmail.com and KLSDJ9S, mission accomplished.
Note: For wireless adapters, turn off the promiscuous mode if you fail to detect any MAC addresses. This should fix your problem, but you won’t be able to spoof your MAC address with promiscuous mode turned off.
DotAAssistant + AudiSim + cPrOxCMS
Most of the time when I run my IDE, the image in my mind would turn monochromic; it happens especially when I look into the list of projects I developed over the past 3 years. At that particular instant I would rekindle the sweet past I had with each and every of the programs I developed, and my heart would go filled with guilt because I feel that it’s a sin to keep programs with superior level of built-in intelligence but not touching them. So I decided to release them to the public, and do bear in mind that you are free to improve or use any part of my releases below.
Here comes the most favorable moment, I hereby present you three of my wonderful releases:
-2o1o- DotA Assistant (2009)
DA and the Fuzzy Logic
DA is a program written based on the fuzzy logic, it operates by the means of what the human perceives as true. In a fuzzy logic program, things are evaluated based on the human's reasoning, which is approximate, rather than precise.
For instance, in this program I define a player as "absent" if the DA fails to locate the RGB of players after 8 scans. However, the concept of absence may differ between each individual. Another programmer who develops a similar program may see the phenomenon “absence” as player who goes disappear from minimap for longer than 10 seconds. Each of us uses a different way of defining the absence of a player. Despite the fact some people who may think our statements are incorrect; we could arguably say that we are both correct, essentially. This in turn, implies that fuzzy logic relies on the mathematical model of the vagueness phenomenon, on the basis of degree of truth.
The picture below illustrates one of the possible outcomes in the process of detecting the player’s changes of state (ea: from absent to present), with the following conditions:
-Missing counter increases by one per scan if DA fails to detect one of the possible combinations of RGB for the player in the minimap
-Missing counter resets back to 0 whenever DA detects one of the possible combinations of RGB for the player in the minimap
-DA reports the player as out of sight as soon as the missing counter for the player reaches more than 4
-DA declares the player as missing as soon as the missing counter for the player reaches more than 8
The picture below illustrates one of the possible outcomes in the process of determining the existence of a player
We assume that
-Player 1 = Player with strong RGB = Player with color which is NOT susceptible for DA to be misled another color for it (ea: red,blue,white)
-Player 2 = Player with weak RGB = Player with color which is susceptible for DA to be misled another color for it (ea: pink and teal)
-Player 1 and 2 exists in the game
-Player 3 does not exist in the game(does not exist in minimap)
with the following conditions:
-Registration counter increases by one whenever DA detects one of the possible combinations of RGB for the player
-DA registers the player as soon as the registration counter reaches more than 5
Classes in DA
DA contains 13 classes. 4 of the classes were used solely for testing purpose, 7 contains algorithms used for various functions, 1 class contains predefined sets of variable for character recognition, and the last class contains Main, used for executing DA. The brief descriptions of significant classes in DA are as follows:
CharRecognition
Contains the method definition for the size, pattern points and precedence of the characters. Also contains the logic for character scanner and logic..
CharRecognitionHandler (there's one faux paus committed, the CharRecognition which contains the method definition should be named CharRecognitionHandler whereas the CharRecognitionHandler which contains the predefinition of variable should be named CharRecognition)
Contains only the predefined sets of variable for character recognition
DBHandler
Contains various methods involving the interaction with DA database
DotAAssistant
Contains fuzzy logic to determine what the DA should do if the player changes its state, ea: from absent to present
DotALogic
Contains mathematical logic involving formulas used to calculate attributes of a player
FileOperator
Contains various methods involving file I/O operations
ItemRecognition (Incomplete)
Contains the methods used to retrieve RGB codes of different items to form a combination of pattern points
Player
Contains the definition for the player object
PlayerHandler
Contains various methods which handle the changes of player state
AudiSim (2009)
AudiSim is a simulator for the game Audition Online, which was written in Java. It serves as a training tool for the user to improve by a larger margin. Do bear in mind that in practice the effect may vary, depending on the individual attributes.
I’ve also written a replay viewer for AudiSim, a program which has the ability to load the AudiSim replay files. In order for the replay viewer to load the replay file, you would need to insert the directory of the replay file in the command line (by right clicking run.bat -> edit).
The user configuration file(config.ini)
AudiSim first retrieves the user configuration from the file passed to the ini constructor (Main.java) . It thereafter passes the retrieved user configuration to the AudiSim Interface constructor.
Next, AudiSim instantiates/initializes the necessary objects and variables for the GUI to be loaded. At this point the database handler(DBHandler) begins to come into the picture by executing a query on the AudiSim database to retrieve the number of moves left for the user to successfully complete the training session.
AudiSim then passes the value to the moveArray for the program to allocate the space needed to store the moves. After that, AudiSim explicitly invoke the method retrieveSetofMoves(int bpm) in DBHandler to retrieve the set of keys left, fills the moveArray up with the set of keys stored in the result set and DBHandler terminates the loop after it reaches the last set of key. Next, AudiSim shuffles the key in order for them to appear in randomized order (if you do not want the key to be shuffled, change the shuffle = true to false).
After shuffling the collection of moves, AudiSim retrieves the first set of move in the collection and assigns a copy of current move to the log data buffer (var : dataBuffer). AudiSim identifies the length of the current move to ascertain the number of image panel needs to be repainted. Each of the keys goes through the arrow handler procedure, the segment where the chance handler kicks in and assigns chance flag to the selected keys.
Lastly, AudiSim creates the key listener to listen for user input for appropriate changes to be made on the image panel when user presses an arrow key. AudiSim also creates file handles and output stream associated with the replay and log file if the user specifies the session to be saved and logged.
cPrOxCMS (2008)
cPrOxCMS is a web content management system for MapleStory private server written fully in PHP. It serves as a framework for the players to access information and several features in the server. The core feature of cPrOxCMS which makes it distinctive from all other available MapleStory CMS around is the crafting system, which I would delve deeper into it in the latter part of this post(don't expect much though, as I will only be posting screenshots of it).
I won’t be explaining much about this release because if I were to do so for these 70 files of PHP code, I believe I’ll lose my sanity before I even finish writing the first half of the files. Hence, I will only be listing out the features for this release. Pardon my inability to fully enlist the features of the CMS because my memory decided to fail me at this point of time.
Warning: You are strongly advisable to place the news_insert.php, news_insertion.php and a copy of config.php in another hidden directory in your webserver otherwise anyone who accesses your web could easily load the news insertion page and create a big havoc in the news section.
Features
-Sanitizing forms/queries input to prevent SQL Injection attacks
-Cookie checking to prevent cookie modification
-Auto tracing unrecognized user inputs
-Nation
-Shoutbox
-Ranking
-Crafting system
-Allows user to customize permissions for viewing buddy and guild list
-Allows user to view guild members/BBS threads and posts
-Allows user to view buddy list
-Allows user to upload screenshots
-Allows user to perform character reset
-Allow user to add stats
-Fully customizable crafting rate
Download links
Tuesday, November 17, 2009
-2o1o- DotA Assistant
DotA Assistant is a program fully built in Java, with pixel scanning merchanism as the core. It's meant to work alongside with Warcraft III : DotA, with the features listed as follows:
Main features :
- Detecting opponent heroes and notifying the user when the program loses vision of the heroes in the minimap
- Assessing the battling ability of heroes and displaying the probability of one hero winning another to the user
Note : Bear in mind that this program is still heavily being tested and the information produced are therefore susceptible to inaccuracy.
The Hero Detection System
At the beginning of the game the program will scan the minimap and register the hero when its found by the scanner.
When any of the registered heroes appear to be missing, the user will be notified by the program about the absence of the hero.
The hero will also be notified when the hero is no longer absent.
The Hero Assessment System
The hero assessment system basically retrieves the attribute information of the hero(es) and produces useful information such as the effective hitpoints, attack cooldown, amount of time needed for a hero to kill another hero, and the probability of a hero winning another hero.
(The timer starts at the moment the projectile of my hero's shot reaches my opponent)
The screenshot above demonstrates the output produced by the hero assessment system. Pay more attention to the fourth line counting from the bottom, Amount of time needed for Drow Ranger to kill Dwarven Sniper and the magnified timer on top(click on the ss if you're unable to view it in full).
The magic of mathematics,
31 + 15.84 = 46.84s = 47s(as shown in the screenshot)
Note : The information produced only serve as a benchmark for the user to gauge the battling ability of the selected hero. By the program telling me that my hero needs 15.84s to kill my opponent DOES NOT mean that my hero would take that amount of time to kill the hero when it comes to real-play situation.
Nature of the program
It's always more interesting when it comes to the conceptual explanation of any of the -2o1o- grand releases because their true beauty and elegance do not come from the output it produces to the user, which I've briefly covered some of them above; it conversely comes from the inner design of the program: the robustness of algorithm, the style of programming logic, the control flags, the playaround of APIs. These are the few elements, which forms the core, the foundation of the program.
Please do not feel perplexed as you scroll down and read the walls of technical terms/graphical illustration. Trust me, its not as hard as it seems if you have the foundation in programming; you just have to be ready work with pixels and myriads of RGB codes
This program technically encompasses of three main modules:
- The Minimap Scanner
- The Name Scanner
- The Attribute Scanner
The Minimap Scanner
The program uses Robot API to create an image capture at the minimap region of the screen(captured image as shown in the picture above). Initially the program was set to scan at an incremental value of 3 for both the x and y value each time it moves, it was however being found out that the algorithm could not achieve the intended accuracy after performing several stages of testing.
For the sake of producing promising information to the user(I would not say its 100% accurate but at least the inaccuracy is pardonable), I've decided to use the performance-accuracy tradeoff approach. The accuracy of the scanner was increased at the expense of the CPU performance by decreasing the incremental value for both x and y from 3 to 1, thus proportionally increasing the number of pixel scans from 4333 1/3(195 x 200 / (3x3)) to 39000(195 x 200 / (1x1)) scans for each cycle(195 x 200 is the size of the minimap).
Note : For users with low computer performance, the minimap scanner will hog a considerable amount of CPU usage whenever its scanning takes place. I'll revise the algorithm so that it overcomes such deficiency if there's a high demand from the users.
Once the program found a pixel containing the RGB value listed in the predefined sets of RGB range, it will register the found pixel as an identifier of a player, in accordance with the player number set for the predefined range of RGB codes
If any of the opponent heroes happens to be missing from the sight of the user or his/her allied units, the mini dot corresponding to the missing hero would likewise be missing from the minimap. This event would render the minimap scanner unable to scan for the missing hero range of RGB codes, thus making it consider the hero missing from the map. When this happens, the Robot API typeMessage() method would explicitly be invoked by the program to send a message notifying the user regarding the absence of the hero.
The Name Scanner
Its the human's nature for being able to recognize the letters at a glance, irregardless of how badly written the letter is or how much the letter deviates from its actual style. This is also the main reason why image verification mechanism become one of the most powerful and prominent security measure in the world.
But let's face it, computer processes at a speed where not even millions of our human brain could beat it. When it comes to character recognition, I certainly do not deny the fact that the computer may not be able to recognize letter as good as the human at first; but when its crafted with a well structured algorithm defining the technical way of recognizing the letters in procedural manner, it's not a total impossibility for a computer attain the same level of character recognition as the human. When this stage is achieved, the level and amount information a computer produces over time could far surpass the human.
I crave your pardon for being slightly out of topic, but this is exactly what's on my mind before it comes to the development of the scanners, especially the name and attribute scanners. You see, being able to analyse up to 10k characters and generating up to thousands lines of information per sec to the user, I believe that it's not beyond the bounds of possibility for a player to play DotA in a complete flawless style. Let's get back on topic, introducing you the second core module of the program - the name scanner.
Similar to the aforementioned minimap scanner, the name scanner uses Robot API to capture an image of the region which contains the level and the name of hero.
The only difference is that the y value works as a baseline and remains as a constant whenever the name scanner does its scanning, whereas the value of x changes as the scanner pointer moves. The scanner travels on x axis, on the basis of pixel by pixel, ignores every black pixels it come across and stops at the point where a white pixel is detected.
Take the picture above as an example, the first point, with the coordinate being (538,733), is the point where the scanner stops(we call it the point of origin) to scan for the pattern of the character it stops at(in this case its the letter L), and compares with the predefined pattern points to determine what letter it is.
The letter which resides on the point of origin has to match all the predefined pattern points before it's being recognized as the corresponding letter. For simplicity's sake, I'll omit 8 out of 12 of the pattern points for the letter L, leaving only 4 pattern points to be discussed.
Note : the combination of both integer value x and y form a pattern point which denotes the distance of the pattern point from the point of origin.
Since 8 of the pattern points for the letter 'L' have been omitted, lets just deny their existence. Don't look at them, don't even think of the purpose of them being there; just focus on the four points circled in red.
Note : The coordinate system in computer does not work the same as the Cartesian coordinate system, the decrement of value y in the computer system indicates that the point is moving upwards instead of downwards. In another word, the smaller the value y of a point, the higher the position of the point in the computer coordinate system.
The scanner begins by checking the first pattern point for the letter 'L', (0,-4). If the pixel 0 x and -4 y distance away from the origin is a white pixel the scanner will proceed to the second pattern point; if its a black pixel, the scanner directly negate the possibility of the letter being 'L'. In this case the origin is (538,733) and the pixel at the position 0 x and -4 y distance away from the origin is not a black pixel.
Next, the scanner will check the second pattern point for the letter 'L' (1,-4).
Similar to the concept above, if the pixel at the position 1 x and -4 y distance away from the origin is not a black pixel, the program will preserve the possibility of the letter being 'L' and continue to check for the next pattern point.
The process repeats until the scanner reaches the last pattern point.
If every of the pixels scanned does not appear to be a black pixel to the program, the program will recognize the letter as the corresponding letter(in this case the program will recognize the letter above as a 'L').
The character precedence
Now we've finished about the character recognition part; the program could now recognize the letters individually if we supply the pattern points as a guideline for the program, but what happens if the program tries to recognize letters bearing a resemblance to another letter?
For instance, the letter c could be recognized by the program as an 'e' and the letter l could be recognized by the program as a 'L', owing to the fact that the letter e and l have all the pattern points for the letter c and L respectively. When the scanner scans the line of text 'Level 1 Bone Fletcher' it will produce the output Llecvecl 1l Bocnec Fllectlchlecr.
A simple workaround for this is to specify a set of precedence for every of the letters bearing a resemblance to another letter. Taking the first part as an example, if the scanner scans and recognizes a letter being both letter a and i, it will negate the possibility of the letter being 'i' because logically the letter a has more pattern points(more pixels) compared to the letter 'i'.
The Attribute Scanner
The attribute scanner does not differ much from the name scanner except that instead of having only one baseline, the attribute scanner scans on 5 different elevation of y axis. After the translation of pixels to numbers, the translated numbers will be passed to the DotALogic class to process other information which the users may deem useful, such as the EHP, DPS, AC, damage reduction, hp regen, mp regen. The formulas to perform calculation as such are listed as follows:
Assume that
Average Damage = avgDmg
Minimum Damage = minDmg
Maximum Damage = maxDmg
Additional Dmg = addDmg
Total Dmg = totalDmg
Attack Cooldown = AC
Damage per second = DPS
Effective HP = EHP
Hit point = HP
Time needed to kill = TNTK
Probability of winning = PoW
Additional Agility = addAgi
Damage Reduction = dmgReduction
Additional Armor = addArmor;
Hero1 = First hero to be compared
Hero2 = Second hero to be compared
avgDmg = (minDmg + maxDmg) * 0.5
totalDmg = avgDmg + additionalDmg
AC = baseAC / (1 + ((agi + addAgi) * 0.01))
DPS = (addDmg + avgDmg) /AC
DPS = (addDmg + avgDmg) / ( baseAC / (1 + ((agi + addAgi) * 0.01)))
dmgReduction = 1 - (1 / (1 + ((armor + addArmor) * 0.06)))
EHP = HP/damageReduction
EHP = HP / 1 - (1 / (1 + ((armor + addArmor) * 0.06)))
Hero1.TNTK = Hero2.HP / ((Hero1.totalDmg * (1-Hero2.dmgReduction / Hero1.AC) )-Hero2.HPRegen)
Hero1.TNTK = Hero2.HP / (((((Hero1.minDmg + Hero1.maxDmg) * 0.5) + Hero1.additionalDmg) * (1-(1 - (1 / (1 + ((Hero2.armor + Hero2.addArmor) * 0.06)))) / (Hero1.baseAC / (1 + ((Hero1.agi + Hero1.addAgi) * 0.01)))) )-Hero2.HPRegen)
Hero2.TNTK = Hero2.HP / ((Hero2.totalDmg * (1-Hero1.dmgReduction / Hero2.AC) )-Hero1.HPRegen)
Hero2.TNTK = Hero1.HP / (((((Hero2.minDmg + Hero2.maxDmg) * 0.5) + Hero2.additionalDmg) * (1-(1 - (1 / (1 + ((Hero1.armor + Hero1.addArmor) * 0.06)))) / (Hero2.baseAC / (1 + ((Hero2.agi + Hero2.addAgi) * 0.01)))) )-Hero1.HPRegen)
Hero1.PoW = 1 - Hero1.TNTK / Hero1.TNTK + Hero2.TNTK
Hero1.PoW = 1 - (Hero2.HP / (((((Hero1.minDmg + Hero1.maxDmg) * 0.5) + Hero1.additionalDmg) * (1-(1 - (1 / (1 + ((Hero2.armor + Hero2.addArmor) * 0.06)))) / (Hero1.baseAC / (1 + ((Hero1.agi + Hero1.addAgi) * 0.01)))) )-Hero2.HPRegen)) / ((Hero2.HP / (((((Hero1.minDmg + Hero1.maxDmg) * 0.5) + Hero1.additionalDmg) * (1-(1 - (1 / (1 + ((Hero2.armor + Hero2.addArmor) * 0.06)))) / (Hero1.baseAC / (1 + ((Hero1.agi + Hero1.addAgi) * 0.01)))) )-Hero2.HPRegen))+(Hero1.HP / (((((Hero2.minDmg + Hero2.maxDmg) * 0.5) + Hero2.additionalDmg) * (1-(1 - (1 / (1 + ((Hero1.armor + Hero1.addArmor) * 0.06)))) / (Hero2.baseAC / (1 + ((Hero2.agi + Hero2.addAgi) * 0.01)))) )-Hero1.HPRegen)))
Future enhancement
- Item recognition
I've already completed coding the algorithm its just that the process of extracting up to 3k sets of item RGB codes is a big hurdle for me, even with the assistance of an API written intentionally for this purpose. - Skill prediction
To calculate the damage and predict the sequence of skill the opponent will use on the user's hero - Enhanced version of hero assessment system
Enhanced version of hero assessment version will take orb effects such as critical strike, life steal and bashing into account. - Performance
The current algorithm scans up to 500k or even 1 million pixels per second would gradually lead to CPU exhaustion, it needs to be revised and rewritten.
(a screenshot of the database for item recognition module in 0th Normalization Form 0NF)
I won't release the executables or the source code anytime soon because currently the codes are still at the level of content coupling and commucational cohesion and furthermore there are still lots of architectural revision to be made.
Tuesday, November 18, 2008
-2o1o- Birthday Attack!
In probability theory, the birthday problem, or birthday paradox[1]
pertains to the probability that in a set of randomly chosen people some pair of
them will have the same birthday. In a group of 23 (or more) randomly chosen
people, there is more than 50% probability that some pair of them will both have
been born on the same day. For 57 or more people, the probability is more than
99%, reaching 100% as the number of people reaches 366.[2] The mathematics
behind this problem leads to a well-known cryptographic attack called the
birthday attack. [Wikipedia.org]
“How many birthdays does a person have in a year?”
“One”
“Then assume in a group of 23, how do we calculate the probability of a person having the same birth date as another?”
“Since the probability of one person’s birthday falls on a day is 1/365, so in a group of 23 it would be 23/365 = 6.3%”
(not taking the presence of leap years into account)
WRONG
So what’s the right way of answering this?
The simpler version of formula is
(number of people * (number of people – 1) ) / 2
(23 * 22) / 2 = 253 pairs
If you’re so inclined to find out the proper way of solving the case above, you can search for permutations and combinations on the web. Yea, and be prepared to work with big numbers.
Next, we calculate the probability of one person’s birthday DOESN’T fall on a day, and that would be
364/365 = 0.99726 (99.72%) it seems really big now but trust me, after a number of self multiplication, it shrinks really fast
we then apply the 253 as the exponent since now we are trying to get the probability of 253 pairs(group of 23).
and it would be
0.99726^253 = 0.4995 (49.95%) get what i was trying to say?
Now we’ve gotten the probability, but we are not done yet. Note that we were only focusing on the problem scenario (with the keyword DOESN’T), so in order to get the probability of a group of 23 sharing the same birth date we would do
1 – 0.4995 = 0.5005 (50.05%)
A graph showing the approximate probability of at least two people sharing a birthday amongst a certain number of people.
Now that we’ve calculated the number of pairs, the problem scenario’s probability and the actual probability. So what does it got to do with cryptography?
The Digital Signature
We are proceeding to the interesting part; let’s take the common archetypes in cryptography as an example: Alice, Bob, Mallory, Carol and Bob. Suppose that Alice wants to send an encrypted letter to Carol, and with the content that she wants to recommend Dave as the IT manager in the company, her letter would look something like
Dear Carol,
I would like to recommend Dave as the IT manager of our company because from my observation he has the potential to handle the role as an IT manager. Furthermore he is hardworking and does not late..
Then the Alice would digitally sign the letter using PGP(Pretty Good Privacy) and hand it to her secretary Mallory. However due to some personal disputes Mallory had with Dave, Mallory does not want Dave to hold the position. Hence, she would write another letter to stop Dave from getting the job and the letter would look like
Dear Carol,
I feel that Dave is not a perfect candidate for the role as the IT manager because he is often late and does not show any commitment in his work. Furthermore he does not know how to cooperate with his colleagues..
Indeed this letter would be perfect for Mallory to achieve her goal, but obviously Carol would not sign this letter and it is computationally infeasible for Mallory to obtain the private key from the digital signature. To further enhance your understanding on how the Birthday Attack takes place in this scenario, I will break down the following explanation into steps.
1. Alice is given two keys. One is called the public key and another one is called the private key.
2. The public key is given to anyone who needs it, but the private key is kept to her only.
3. Both one of the key can encrypt the data and the other key can be used to decrypt the data
4. Alice can encrypt the letter using her private key and Carol can decrypt the letter thereafter using her public key and vice versa. Mallory has the access of the encrypted letter but without the Alice's private key she can do nothing with it.
5. In order to sign a document, Alice will convert the letter into just a few lines by a process called “hashing” (ie: MD5, SHA). These few lines are called the message digest, and since it was created using one-way-hash function, it cannot be reverted back to its initial plaintext form.
6. Alice then encrypts the message digest using her private key and that would produce the digital signature.
7. Alice attaches the digital signature to the letter and passes it to her secretary Mallory, asking her to pass the encrypted letter to Carol.
8. Mallory decrypts the digital signature back to the message digest form.
We assume that the letter
Dear Carol,
I would like to recommend Dave as the IT manager of our company because from my observation he has the potential to handle the role as an IT manager. Furthermore he is hardworking and does not late..
produces the message digest
iOJfioJcioSJSfioJSVkSMflsdfkLNSgWIjfLImnvcLKFnLhfIOShfIEGNLSnbZLSHgfIHFLHGliHklvNSlgfNSLHISLIhfLnfLIjhgPQOgpGiqojOGVIAOJBKdmaKLdnVJW
and the letter
Dear Carol,
I feel that Dave is not a perfect candidate for the role as the IT manager because he is often late and does not show any commitment in his work. Furthermore he does not know how to cooperate with his colleagues..
produces the message digest
SGIOPDOPvmKFmvklNLSDKFLWGOPvjWIPGKBVmWKLVnLKTJWPFjIPtfwnMCnMFWPGUWJbpJKVCM:LKCMQJFaOIBPOKjfSFjKSfjKSLvjKLvKSjvSjdfjsf
If Mallory were to substitute the second letter with the original one and pass it to Carol, she would be able to tell that the letter has been substituted since the message digest produced by the hashing function of the second letter does not match the message digest produced by the decryption of the digital signature.
The Birthday Attack takes place
Mallory writes the letter again
Dear Carol,
I (feel think) that Dave (is not isn’t) a perfect candidate for the role as the IT manager because (he is he’s) (often always) late and (does not doesn’t) show any (commitment responsibility) in his work. (Furthermore Moreover) he (does not doesn’t) know how to (cooperate work) with his colleagues..
The ( aa bb ) denotes the alternative usage of word, which means either aa or bb can be used to produce different variation of letters while keeping the message conveyance of the letter intact. Since there are 9 combinations of alternative words, Mallory is able to generate up to 2^9 amount of similar letters using all the possible alternative pairs. Mallory would then hash all of the generated letters and given enough of combinations she would eventually be able to come up with a letter with exactly the same message digest as the original one.
For example,
Letter (257/512)
Dear Carol,
I think that Dave is not a perfect candidate for the role as the IT manager because he is always late and doesn’t show any responsibility in his work. Moreover he does not know how to work with his colleagues
Message Digest :
SfiPSJifoSJidfOSJDIONGWIOSNVISjfWPJDiMvNWkjdHSDjASHfuSHfoSHfoWSHfjSHDjhfushfOUShfJShfOSdhAODHOAhfOWSFhSJfhSdjHJSdhUShJfhSJfhJSHFj
(does not match with the original one)
Letter (412/512)
Dear Carol,
I feel that Dave isn’t a perfect candidate for the role as the IT manager because he’s often late and doesn’t show any commitment in his work. Moreover he does not know how to work with his colleagues
Message Digest :
iOJfioJcioSJSfioJSVkSMflsdfkLNSgWIjfLImnvcLKFnLhfIOShfIEGNLSnbZLSHgfIHFLHGliHklvNSlgfNSLHISLIhfLnfLIjhgPQOgpGiqojOGVIAOJBKdmaKLdnVJW
(matches the original one)
Consequently, Carol would not be able to notice that the letter has been altered since the letter generates the same message digest as the digital signature.
The Birthday Attack program
I've written a program to prove the mathematics behind the probability theory. In the program there's only one constructor, accepting two parameters.
public BirthdayAttack(int itemRange,int amount) {
I shall stick to the example given above, using the same itemRange(the days) and the amount(the number of people in a group)
public static void main(String[] args) { BirthdayAttack ba = new BirthdayAttack(365,23); ba.generate(); ba.compare(); ba.calculate(); }
The Output
Generated numbers
-----------------
39
155
191
327
352
163
121
282
8
166
210
182
331
310
29
358
352
285
71
92
66
353
28
--End of number generation--
2 352 spotted
The probability :
50.048%