SPAM POTECTION
author will discuss one by one the way to reduce SPAM because the resulting adverse impact. From the technical point of view to reduce SPAM, the filtering
There have been several filtering methods that can be used to prevent spam include:
a. Keyword filtering
This method is an Application Layer Filtering (ALF). With this method, in-block spam based on certain words that are often written in spam-mail for example: "viagra", "porn".
b. Signature-Based Filtering
This method will compare the e-mails that come with spam-mails that have been identified. This is done by creating several e-mail address false. Spam-mail is usually sent to hundreds of e-mail address will also be sent to fake addresses this. So by making a list of addresses which are sending mail to the address of this fake, spam mails can be blocked. One way to show that two similar e-mail is done by giving "signature" on every e-mail. Methods to provide such signatures by providing numbers for each letter, then all these numbers add up. So that every e-mail will have a "signature" different. In this case, two e-mails that have the same signature, and sent to several addresses can be categorized as a spam-mail. The way this is applied to the signature-based filtering. But this filter method is easily defeated by spammers. Simply add any different character on each copy of spam-mail, will make a copy of spam-mails that have a different signature. So this method is not very effective to make the spam filter.
c. Bayesian (Statistical) Filtering
Bayesian filtering method is a method of anti-spam filter updates. These methods identify spam based on the words (tokens) contained in an e-mail. Filter method was first needs to be "training" using the two e-mail collections, one collection is a spam-mail, and other collections are legitimate mail. In this way, at each new e-mail received, Bayesian filters can estimate the probability of spam based on words that often appear in spam-mail collections or in collections of legitimate mail. Bayesian filters to effectively block spam because of this filter can automatically categorize-mail spam or legitimate mail.
The weakness of the Bayesian chain rule is that each word is assumed to be separate and independent of one another. Yet in analyzing a text, every word connected with one another. The weakness is overcome by the algorithm chi-squared probability developed in the SpamBayes project (discussed below).
Bayesian Algorithm Development
One Bayesian filtering algorithm development is SpamBayes project which aimed to reform the Bayesian filter algorithm first developed by Paul Graham. SpamBayes project is led by Gary Robinson and Tim Peters. In principle, this same project with SpamBayes Bayesian algorithm of Paul Graham. The good news is SpamBayes can categorize mails become spam-mail, non-spam mails (ham), and blink-mail. Blink-mail can be said as a message that can not be categorized by rating the spam mail or ham mail. This categorization is also done in the same way by giving the SpamBayes learning algorithm based on some e-mails classified as spam mail or ham mail.
Architecture of SpamBayes system has several different parts of the Bayesian algorithm of Paul Graham, Among them:
a. Tokenizing
Tokenizer will read the mail and divide it into several words (token). Tokenizing process can be performed on message body, message header, HTML code, and images. But because this project took a sample of spam and ham mails from sources different, then tokenizing the SpamBayes project is only done on the body message. Tokenizing the message body is done by detecting space (white space) between words. Surely by tokenizing the message body and headers, characterization mail spam or ham can be done better. Tokenizing on message headers can be done by calculating the message to the recipient Jumah recipient (to / cc) header. While tokenizing the HTML code can be done on the code "font", "table", or "background". Tokenizing also be done to show that message with no subject header, with no from address, will be classified as spam-mail.
b. Combining and Scoring
The rest of the system is SpamBayes scoring and combining.
This section distinguishes SpamBayes system with initial algorithm
Bayesian from Paul Graham.
Algorithm Paul Graham:
Bayesian algorithm from Paul Graham just give the value (score) in the mail is spam 1 to 0 for the pure and genuine ham, while the value of it is not categorized as "unsure". All mail will only be categorized as spam or ham, and this can lead to error categorization. Figure below shows the existing problems in Paul Graham's algorithm:
Figure 1. Plot scoring message using the approach Paul Graham
In the picture above, the X-axis shows the value of the message with a scale of 0-100, (with 0 is pure ham and 100 are pure spam. Y axis shows the number of message (in logarithmic scale). From the figure above shows that most spam obtain values around 100 and most of the bacon around the score 0. but can be seen also that there are quite a lot of hams who scored around 100 and there are quite a lot of spam is getting value around 0. This is a security breach categorization error message. The scoring technique is performed by Gary Robinson to produce a plot like this:
Figure 2. Plot scoring message using the approach Gary Robinson
This technique gives different results. From the picture can be seen that there is overlap of values between the value of ham and spam score. This can be overcome by providing a cut-off value, such as a, with a value over a mean spam, and grades below a mean ham. Compared with Paul Graham algorithm that shows a lot of spam-mails are worth about the value of pure ham and vice versa, then Gary Robinson technique has been able to overcome it. No spam mail that has a value around pure ham.
Gary Robinson uses the Central Limit Theorem to make the plot above. This theorem produces two internal values, one for spam and one for ham, and may provide a response "in doubt" if the value of ham and spam both values too high or too low. This can not be done on Paul Graham algorithm.
Approach to the Central Limit theorem is then updated again by Gary Robinson theorem by using the chi-squared probability. Chi-squared theorem similar to the central limit theorem, the excess of the Chi-squared theorem this is not a problem in training as the central limit theorem and the results obtained categorization better.
Theorem Chi-squared produces two values, the probability of a ham ( "* H *") and the probability of spam ( "* S *"). Spam mail will have a value of * S * high and the value of * H * low. In a condition of a mail has a value of * S * and values * H * which are both high or both low, then the resulting probability is approximately 0.5 which means that mail does not include spam and ham are also not included. This condition is called "blink" Spambayes system. Figure below shows the results obtained by Chi-squared theorem:
Figure 3. Plot scoring message using the theorem Chi Squared
As can be seen that at the end of the process, there are three different possible outcomes of Spam, Ham, or blink. Has been discussed earlier that SpamBayes system, message is difficult to be categorized as spam or ham would be categorized as a blink. Suppose that a commercial e-mail from a company that does business with our company, the first time can be regarded as "unsure" mail since spam-mail and mail commercial uses similar language. While the Bayesian algorithm Paul Graham, this mail will remain classified as spam or ham mail, which can lead to false positive or false negative. By doing some training based on sender address or a product the company offered, "blink" mail can then be categorized as spam or ham.
c. Rule-based (heuristic) filtering
This filter to block spam-mails with certain characteristics to look for patterns that indicate spam example: the words "dirty", said a lot of capital letters or lots of exclamation points, or delivery date is not correct. Disadvantages of this method is the rule (rules) that are used are static, so if spammers use to send the new pattern-mail spam, the new rules must be given to the filter. While the Bayesian filters, we simply tell the filter that the classification of e-mail that he did wrong, then the Bayesian filter will automatically learn the pattern found on the e-mail.
e. Challenge-response filtering
If we get an e-mail from someone the first time, the challenge-response filter will send an e-mail back to the sender's address and ordered him to access a certain web address and fill out a form before the e-mail that he could send to us. In this way, we can make accurate spam filters. Because only the sender is truly interested in us will carry out these procedures. But this method can be called "rough", because it makes other people to do extra work to send e-mail to us. Also this method is the lack of legitimate e-mail can be lost or delayed until, because the sender does not know that he must perform a procedure of challenge-response filter to the e-mail is acceptable. Another flaw is that this filter selects only the e-mail based on sender address, the spammers are doing spoofing filters would be able to conquer it. So the filter is not very effective to block spam-mail. Way that can be done is to combine this filter with Bayesian filtering, e-mail is categorized as spam by the Bayesian filter, the re-challenge by challenge-response filters. In this way, the accuracy of the Bayesian filter will increase, and the challenge-response filters can also be used effectively.
From the explanation above, it can be concluded that the filtering method can be done many ways.
Another way of technical terms that can be done is Blocking.Karena filtering method will not overcome the (complete) SPAM, filtering will only help alleviate the problem. On the other filtering that can also bring new SPAM-SPAM. This can happen because the seller of commercial SPAM filtering deliberately put his name as a list client SPAM list. So the manager of the server (client) who do not want to bothered by SPAM filtering forced to buy from the commercial side.
E-mail filtering can discern the "right" with SPAM filtering, but could not prevent the entry into the network SPAM. That requires a way to block SPAM commonly called Real time Black Hole. This way for block SPAM which will enter into the network, SPAM-SPAM is coming from the (machine) else. List of machines that send SPAM will continue in UP DATE by an organization, this list can be used to reject email, or whatever it is coming from the engine listed as a SPAM sender.
There are several ways to do the blocking, among others:
- Address blocking
This method of blocking spam-mail based on IP or domain name or address specific e-mail that has been considered spammers addresses.
- Black listing
This method is similar to blocking address, namely to block spam based on the address list of known spammers. Usually black listing is done by several volunteers and is made in the form of spam-mail database, which can be used by everyone. Black listing one that can be accessed is the Open Relay Data Base, ORDB.org.
- White Listing
Contrary to the Black listing, white listing contains a categorized list of addresses as an e-mail senders of legitimate (legitimate mail). Sender mail address is not included in this list will be assumed as a spam-mail.
CONCLUSION
Based on the discussions that have been made can be concluded that:
- SPAM is junk mail
- Email this waste is considered depending on the individual perspective of each
- Are spam in the email if there is a virus or malware
- No single way to totally eliminate SPAM, only reduce the entry of SPAM
The ways to email or mail protected from SPAM server:
- Never respond to SPAM mail (in the form of offering certain products that are more in number)
- Do not reply with the word "remove" as the active email will be recorded as a large. email you will continue to send SPAM
- Do not access sites that recommended by the SPAM.
SPAM POTECTION