Wednesday, August 02, 2006

Google
 
Web WORLDiKEYWORD

KEYWORD DENSITY

Keyword density

From Wikipedia, the free encyclopedia


Keyword density is the percentage of words on a web page that match a specified set of keywords. In the context of search engine optimization keyword density can be used as a factor in determining whether a web page is relevant to a specified keyword or keyword phrase. Due to the ease of managing keyword density, search engines usually implement other measures of relevancy to prevent unscrupulous webmasters from creating search spam through practices such as keyword stuffing.

Google
 
Web WORLDiKEYWORD

KEYWORD STUFFING


Keyword stuffing

From Wikipedia, the free encyclopedia

Jump to: navigation, search

Keyword stuffing is considered to be an unethical search engine optimization (SEO) technique. Keyword stuffing occurs when a web page is loaded with keywords in the meta tags or in content. The repetition of words in meta tags may explain why many search engines no longer use these tags.

Keyword stuffing is used to obtain maximum search engine ranking and visibility for particular phrases. A word that is repeated too often may raise a red flag to search engines.

Hiding text out of view of the visitor is done in many different ways. Text colored to blend with the background, CSS "Z" positioning to place text "behind" an image – and therefore out of view of the visitor – and CSS absolute positioning to have the text positioned several feet away from the page center, are all common techniques. As of 2005, some of these invisible text techniques can be detected by major search engines.

"Noscript" tags are another way to place hidden content within a page. While they are a valid optimization method for displaying an alternative representation of scripted content, they may be abused, since search engines may index content that is invisible to most visitors.

Inserted text sometimes includes words that are frequently searched (such as "sex"), even if those terms bear little connection to the content of a page, in order to attract traffic to advert-driven pages.

See also

External links

This article is part of the Spamming series.
E-mail spam DNSBL | Spamhaus | Stopping e-mail abuse | Spambot
Address munging | E-mail authentication | Directory Harvest Attack
Spamdexing
& S.E.O.
Google bomb | Keyword stuffing | Cloaking | Link farm | Web ring
Referer spam | Blog spam | Spam blogs | Sping | Scraper site
Telemarketing Autodialer | Mobile phone spam | VoIP spam
Scams Phishing | Advance fee fraud | Lottery scam | Make money fast
Misc. Messaging spam | Newsgroup spam | Flyposting
History of spamming

Google
 
Web WORLDiKEYWORD

KEYWORD & SEARCH ENGINE OPTIMIZATION

KEYWORD & SEARCH ENGINE OPTIMIZATION
Search engine optimization
From Wikipedia, the free encyclopedia
(Redirected from Search engine optimization - Targeted keywords or phrases)


Search engine optimization (SEO) is a set of methods aimed at improving the ranking of a website in search engine listings, and could be considered a subset of search engine marketing. The term SEO also refers to "search engine optimizers," an industry of consultants who carry out optimization projects on behalf of clients' sites. Some commentators, and even some SEOs, break down methods used by practitioners into categories such as "white hat SEO" (methods generally approved by search engines, such as building content and improving site quality), or "black hat SEO" (tricks such as cloaking and spamdexing). White hatters charge that black hat methods are an attempt to manipulate search rankings unfairly. Black hatters counter that all SEO is an attempt to manipulate rankings, and that the particular methods one uses to rank well are irrelevant.

Search engines display different kinds of listings in the search engine results pages (SERPs), including: pay per click advertisements, paid inclusion listings, and organic search results. SEO is primarily concerned with advancing the goals of a website by improving the number and position of its organic search results for a wide variety of relevant keywords. SEO strategies may increase both the number and quality of visitors. Search engine optimization is sometimes offered as a stand-alone service, or as a part of a larger marketing effort, and can often be very effective when incorporated into the initial development and design of a site.

For competitive, high-volume search terms, the cost of pay per click advertising can be substantial. Ranking well in the organic search results can provide the same targeted traffic at a potentially significant savings. Site owners may choose to optimize their sites for organic search, if the cost of optimization is less than the cost of advertising.

Not all sites have identical goals for search optimization. Some sites seek any and all traffic, and may be optimized to rank highly for common search phrases. A broad search optimization strategy can work for a site that has broad interest, such as a periodical, a directory, or site that displays advertising with a CPM revenue model. In contrast, many businesses try to optimize their sites for large numbers of highly specific keywords that indicate readiness to buy. Overly broad search optimization can hinder marketing strategy by generating a large volume of low-quality inquiries that cost money to handle, yet result in little business. Focusing on desirable traffic generates better quality sales leads, resulting in more sales. Search engine optimization can be very effective when used as part of a smart niche marketing strategy.
Contents
[hide]

* 1 History
o 1.1 Early search engines
o 1.2 Organic search engines
* 2 The relationship between SEO and the search engines
* 3 Getting into search engines' listings
* 4 White hat methods
* 5 Black hat methods
* 6 SEO and Marketing
* 7 Legal issues
* 8 See also
* 9 References
* 10 External links
o 10.1 Additional research resources
o 10.2 Search engines' guidelines
o 10.3 Sources of background information

[edit]

History
[edit]

Early search engines

Webmasters and content providers began optimizing sites for search engines in the mid-1990s, as the first search engines were cataloging the early Web. Initially, all a webmaster needed to do was submit a site to the various engines which would run spiders, programs to "crawl" the site, and store the collected data. The default search-bracket was to scan an entire webpage for so-called related search words, so a page with many different words matched more searches, and a webpage containing a dictionary-type listing would match almost all searches, limited only by unique names. The search engines then sorted the information by topic, and served results based on pages they had crawled. As the number of documents online kept growing, and more webmasters realized the value of organic search listings, some popular search engines began to sort their listings so they could display the most relevant pages first. This was the start of a friction between search engine and webmasters that continues to this day.

At first search engines were guided by the webmasters themselves. Early versions of search algorithms relied on webmaster-provided information such as category and keyword meta tags, or index files in engines like ALIWEB. Meta-tags provided a guide to each page's content. When some webmasters began to abuse meta tags, causing their pages to rank for irrelevant searches, search engines abandoned their consideration of meta tags and instead developed more complex ranking algorithms, taking into account factors that elevated a limited number of words (anti-dictionary) and were more diverse, including:

* Text within the title tag
* Domain name
* URL directories and file names
* HTML tags: headings, emphasized () and strongly emphasized () text
* Term frequency, both in the document and globally, often misunderstood and mistakenly referred to as Keyword density
* Keyword proximity
* Keyword adjacency
* Keyword sequence
* Alt attributes for images
* Text within NOFRAMES tags
* Content development

Pringle, et al. (Pringle et al., 1998) [1], also defined a number of attributes within the HTML source of a page which were often manipulated by web content providers attempting to rank well in search engines. But by relying so extensively on factors that were still within the webmasters' exclusive control, search engines continued to suffer from abuse and ranking manipulation. In order to provide better results to their users, search engines had to adapt to ensure their SERPs showed the most relevant search results, rather than useless pages stuffed with numerous keywords by unscrupulous webmasters using a bait-and-switch lure to display unrelated webpages. This led to the rise of a new kind of search engine.
[edit]

Organic search engines

Google was started by two PhD students at Stanford University, Sergey Brin and Larry Page, and brought a new concept to evaluating web pages. This concept, called PageRank, has been important to the Google algorithm from the start [2]. PageRank relies heavily on incoming links and uses the logic that each link to a page is a vote for that page's value. The more incoming links a page had the more "worthy" it is. The value of each incoming link itself varies directly based on the PageRank of the page it comes from and inversely on the number of outgoing links on that page.

With help from PageRank, Google proved to be very good at serving relevant results. Google became the most popular and successful search engine. Because PageRank measured an off-site factor, Google felt it would be more difficult to manipulate than on-page factors.

However, webmasters had already developed link building tools and schemes to influence the Inktomi search engine. These methods proved to be equally applicable to Google's algorithm. Many sites focused on exchanging, buying, and selling links on a massive scale. PageRank's reliance on the link as a vote of confidence in a page's value was undermined as many webmasters sought to garner links purely to influence Google into sending them more traffic, irrespective of whether the link was useful to human site visitors.

Further complicating the situation, the default search-bracket was still to scan an entire webpage for so-called related search-words, and a webpage containing a dictionary-type listing would still match almost all searches (except special names) at an even higher priority given by link-rank. Dictionary pages and link schemes could severely skew search results.

It was time for Google -- and other search engines -- to look at a wider range of off-site factors. There were other reasons to develop more intelligent algorithms. The Internet was reaching a vast population of non-technical users who were often unable to use advanced querying techniques to reach the information they were seeking and the sheer volume and complexity of the indexed data was vastly different from that of the early days. Search engines had to develop predictive, semantic, linguistic and heuristic algorithms. Around the same time as the work that led to Google, IBM had begun work on the Clever Project [3], and Jon Kleinberg was developing the HITS algorithm.

A proxy for the PageRank metric is still displayed in the Google Toolbar, but PageRank is only one of more than 100 factors that Google considers in ranking pages.

Today, most search engines keep their methods and ranking algorithms secret, to compete for finding the most valuable search-results and to deter spam pages from clogging those results. A search engine may use hundreds of factors in ranking the listings on its SERPs; the factors themselves and the weight each carries may change continually. Algorithms can differ widely: a webpage that ranks #1 in a particular search engine could rank #200 in another search engine.

Much current SEO thinking on what works and what doesn't is largely speculation and informed guesses. Some SEOs have carried out controlled experiments to gauge the effects of different approaches to search optimization.

The following factors are speculation on some of the considerations search engines may presently be using or which could be built into their algorithms. A number of these are taken from one of Google's patent applications [4], and may give some indication as to what is in the pipeline. Some are pure speculation. It's also good to keep in mind that Google has over 180 patents and patent applications assigned to them at the US Patent and Trademark Office (USPTO), and a number of those include possible insights into other factors, and other directions that the search engine may follow, some of which may not be consistent with this list.

* Age of site
* Length of time the domain has been registered
* Age of content
* Frequency of content: regularity with which new content is added
* Text size: number of words above 200-250 (not affecting Google in 2005)
* Age of link and reputation of linking site (authority)
* Standard on-site factors
* Negative scoring for on-site factors (for example, a dampening for websites with extensive keyword meta-tags indicative of having been optimized [^SEO-ed])
* Uniqueness of content
* Related terms used in content (the terms that the search engine associates as being related to the main content of the page)
* Google Pagerank (Only used in Google's algorithm)
* External links, the anchor text in those external links and in the sites/pages containing those links
* Citations and research sources (indicating the content is of research quality)
* Stem-related terms in the search engine's database (for example, finance/financing)
* Incoming backlinks and anchor text of incoming backlinks
* Negative scoring for some incoming backlinks (perhaps those coming from low value pages, reciprocated backlinks, etc.)
* Rate of acquisition of backlinks: too many too fast could indicate "unnatural" link buying activity
* Text surrounding outward links and incoming backlinks. A link following the words "Sponsored Links" could be ignored
* Use of "rel=nofollow" to suggest that the search engine should ignore the link
* Depth of document in site
* Metrics collected from other sources, such as monitoring how frequently users hit the back button when SERPs send them to a particular page
* Metrics collected from sources like the Google Toolbar, Google Analytics, Google AdWords/Adsense programs, etc.
* Metrics collected in data-sharing arrangements with third parties (like providers of statistical programs used to monitor site traffic)
* Rate of removal of incoming links to the site
* Use of sub-domains, use of keywords in sub-domains and volume of content on sub-domains… and negative scoring for such activity
* Semantic connections of hosted documents
* Rate of document addition or change
* IP of hosting service and the number/quality of other sites hosted on that IP
* Other affiliations of linking site with the linked site (do they share an IP? have a common postal address on the "contact us" page?)
* Technical matters like use of 301 or 302 to redirect moved pages, showing a 404 server header rather than a 200 server header for pages that don't exist, proper use of robots.txt
* Hosting uptime
* Whether the site serves different content to different categories of users (cloaking)
* Broken outgoing links not rectified promptly
* Unsafe or illegal content
* Quality of HTML coding, presence of coding errors
* Actual click-through rates observed by the search engines for listings displayed on their SERPs
* Hand ranking by humans of the most frequently accessed SERPs

[edit]

The relationship between SEO and the search engines

The first mentions of Search Engine Optimization don't appear on Usenet until 1997, a few years after the launch of the first Internet search engines. The operators of search engines recognized quickly that some people from the webmaster community were making efforts to rank well in their search engines, and even manipulating the page rankings in search results. In some early search engines, such as Infoseek, ranking first was as easy as grabbing the source code of the top-ranked page, placing it on your website, and submitting a URL to instantly index and rank that page.

Due to the high value and targeting of search results, there is potential for an adversarial relationship between search engines and SEOs. In 2005, an annual conference named AirWeb was created to discuss bridging the gap and minimizing the sometimes damaging effects of aggressive web content providers.

Some more aggressive site owners and SEOs generate automated sites or employ techniques which eventually get domains banned from the search engines. Many search engine optimization companies, which sell services, employ long-term, low-risk strategies, and most SEO firms that do employ high-risk strategies do so on their own affiliate, lead-generation, or content sites, instead of risking client websites.

Some SEO companies employ aggressive techniques that get their client websites banned from the search results. The Wall Street Journal profiled a company which allegedly used high risk techniques and failed to disclose those risks to its clients.[5] Wired reported the same company sued a blogger for mentioning that they were banned.[6] Google's Matt Cutts later confirmed that Google did in fact ban Traffic Power and some of its clients.[7].

Google has enforced webpage restrictions for years, such as for hidden-text (background and foreground colors the same hue); in 2006, Google could punish a non-standard website by blocking search-results, automatically, the next day for 30-35 days (or longer), pending a reinclusion request, and if reinstated, revert the index to old/expired/deleted webpages from a year earlier, delaying the re-indexing of the current website for a total of 2-4 months.

Yahoo! and MSN Search do not automatically punish entire websites for small amounts of accidental hidden text.[citation needed] Google's market share of daily searches has fallen rapidly from 75% to 56% over the past few years, as other search engines find many valuable webpages that Google has banned and cannot display due to Google's severely limited index.[citation needed] In early 2006, MSN Search typically re-indexed small websites every 14 days, and Yahoo! also re-indexed quickly, much faster than Google, but all three MSN/Yahoo!/Google could require more than a month to index a new page (new file name) on an old website.

Some search engines have also reached out to the SEO industry, and are frequent sponsors and guests at SEO conferences and seminars. In fact, with the advent of paid inclusion, some search engines now have a vested interest in the health of the optimization community. All of the main search engines provide information/guidelines to help with site optimization: Google's, Yahoo!'s, MSN's and Ask.com's. Google has a Sitemaps program to help webmasters learn if Google is having any problems indexing their website and also provides data on Google traffic to the website. Yahoo! has SiteExplorer that provides a way to submit your URLs for free (like MSN/Google), determine how many pages are in the Yahoo! index and drill down on inlinks to deep pages. Yahoo! has an Ambassador Program and Google has a program for qualifying Google Advertising Professionals.
[edit]

Getting into search engines' listings

New sites do not need to be "submitted" to search engines to be listed. However, Google and Yahoo offer a submission program such as Google Sitemaps that an XML type feed could be created and submitted. Generally however, a simple link from an established site will get the search engines to visit the new site and begin to spider its contents. It can take a few days or even weeks from the acquisition of a link from such an established site for all the main search engine spiders to commence visiting and indexing the new site.

Once the search engine has found the new site, it will generally visit and start to index the pages on the site, as long as all the pages are linked to with anchor tag hyperlinks. Pages which are accessible only through Flash or Javascript links may not be findable by the spiders.

Search engine crawlers may look at a number of different factors when crawling a site, and many pages from a site may not be indexed by the search engines until they gain more pagerank or links or traffic. Distance of pages from the root directory of a site may also be a factor in whether or not pages get crawled, as well as other importance metrics. Cho et al. (Cho et al., 1998) [8] described some standards for those decisons as to which pages are visited and sent by a crawler to be included in a search engine's index.

Webmasters can instruct spiders to not index certain files or directories through the standard robots.txt file in the root directory of the domain. Standard practice requires a search engine to check this file upon visiting the domain, though a search engine crawler will keep a cached copy of this file as it visits the pages of a site, and may not update that copy as quickly as a webmaster does. The web developer can use this feature to prevent pages such as shopping carts or other dynamic, user-specific content from appearing in search engine results, as well as keeping spiders from endless loops and other spider traps.

For those search engines who have their own paid submission (like Yahoo!), it may save some time to pay a nominal fee for submission. Yahoo!'s paid submission program guarantees inclusion in their search results, but does not guarantee specific ranking within the search results
[edit]

White hat methods

White hat methods of SEO involve following the search engines' guidelines as to what is and what isn't acceptable. Their advice generally is to create content for the user, not the search engines; to make that content easily accessible to their spiders; and to not try to game the system. Often, webmasters make critical mistakes when designing or setting up their websites, inadvertently "poisoning" them so that they will not rank well. White hat SEOs attempt to discover and correct mistakes, such as machine-unreadable menus, broken links, temporary redirects, or a poor navigation structure.

Because search engines are text-centric, many of the same methods that are useful for web accessibility are also advantageous for SEO. A detailed case for this common ground, cited by the W3C with respect to Developing a Web Accessibility Business Case, is SEO A Positive Influence on Web Accessibility. Google have brought the relationship between SEO and accessibility even closer with the release of Google Accessible Web Search which prioritises accessible websites.


Methods are available for optimizing graphical content, including ALT attributes, and adding a text caption. Even Flash animations can be optimized by designing the page to include alternative content in case the visitor cannot read Flash.

Some methods considered proper by the search engines:

* Using unique and relevant title to name each page.
* Editing web pages to replace vague wording with specific terminology relevant to the subject of the page, and that the audiences that the site was developed for will expect to see on the pages, and will search with to find the page.
* Increasing the amount of unique content on the site.
* Writing quality content for the website visitors instead of the search engines.
* Using a reasonably-sized, accurate description meta tag without excessive use of keywords, exclamation marks or off topic terms.
* Ensuring that all pages are accessible via anchor tag hyperlinks, and not only via Java, Javascript or Macromedia Flash applications or meta refresh redirection; this can be done through the use of text-based links in site navigation and also via a page listing all the contents of the site (a site map).
* Allowing search engine spiders to crawl pages without having to accept session IDs or cookies.
* Developing "link bait" strategies. High quality websites that offer interesting content or novel features tend to accumulate large numbers of backlinks.
* Participating in a web ring with other quality websites.
* Writing useful, informational articles under a Creative Commons or other open source license, in exchange for attribution to the author by hyperlink.

[edit]

Black hat methods

Main article: Spamdexing

"Black hat" SEO are methods to try to improve rankings which are disapproved of by the search engines, typically because they consider such methods deceptive, and unrelated to providing quality content to site visitors. Search engines often penalize sites they discover using black hat methods, by reducing their rankings or eliminating their listings from the SERPs altogether. Such penalties are usually applied automatically by the search engines' algorithms, because the Internet is too large to make manual policing of websites feasible.

Spamdexing is the promotion of irrelevant, chiefly commercial, pages through deceptive techniques and the abuse of the search algorithms. Over time a widespread consensus has developed in the industry as to what are and are not acceptable means of boosting one's search engine placement and resultant traffic.

Spamdexing often gets confused with white hat search engine optimization techniques, which do not involve deceit. Spamming involves getting websites more exposure than they deserve for their keywords, leading to unsatisfactory search results. Optimization involves getting websites the rank they deserve on the most targeted keywords, leading to satisfactory search experiences.

When discovered, search engines may take action against those found to be using unethical SEO methods. In February 2006, Google removed both BMW Germany and Ricoh Germany for use of these practices.[9]

Cloaking is the practice of serving one version of a page to search engine spiders/bots and another version to human visitors.
[edit]

SEO and Marketing

There is a considerable sized body of practitioners of SEO who see search engines as just another visitor to a site, and try to make the site as accessible to those visitors as to any other who would come to the pages. They often see the white hat/black hat dichotomy mentioned above as a false dilemma. The focus of their work isn't primarily to rank the highest for certain terms in search engines, but rather to help site owners fullfill the business objectives of their sites. Indeed, ranking well for a few terms among the many possibilities does not guarantee more sales. A successful Internet marketing campaign may drive organic search results to pages, but it also may involve the use of paid advertising on search engines and other pages, building high quality web pages to engage and persuade, addressing technical issues that may keep search engines from crawling and indexing those sites, setting up analytics programs to enable site owners to measure their successes, and making sites accessible and usable.

SEOs may work in-house for an organization, or as consultants, and search engine optimization may be only part of their daily functions. Often their education of how search engines function come from interacting and discussing the topics on forums, through blogs, at popular conferences and seminars, and by experimentation on their own sites. There are few college courses that cover online marketing from an ecommerce perspective that can keep up with the changes that the web sees on a daily basis.

While endeavoring to meet the guidelines posted by search engines can help build a solid foundation for success on the web, such efforts are only a start. Many see search engine marketing as a larger umbrella under which search engine optimization fits, but it's possible that many who focused primarily on SEO in the past are incorporating more and more marketing ideas into their efforts, recognizing that search engines themselves have expanded their coverage to include RSS feeds, video search, local results, mapping, and other novel services hence classifying them (The SEO Firm) as an ad agency.
[edit]

Legal issues

In 2002, SearchKing filed suit in an Oklahoma court against the search engine Google. SearchKing's claim was that Google's tactics to prevent spamdexing constituted an unfair business practice. This may be compared to lawsuits which email spammers have filed against spam-fighters, as in various cases against MAPS and other DNSBLs. In January of 2003, the court pronounced a summary judgment in Google's favor.
[edit]

See also

* Google consultant
* SEO contest
* Spamdexing
* Search engine marketing
* Search Engine Marketing Professional Organization
* Search Marketing Association - North America
* Free content
* Web syndication
* Content development
* Organization of Search Engine Optimization Professionals

This article is part of the Spamming series.
E-mail spam DNSBL | Spamhaus | Stopping e-mail abuse | Spambot
Address munging | E-mail authentication | Directory Harvest Attack
Spamdexing
& S.E.O. Google bomb | Keyword stuffing | Cloaking | Link farm | Web ring
Referer spam | Blog spam | Spam blogs | Sping | Scraper site
Telemarketing Autodialer | Mobile phone spam | VoIP spam
Scams Phishing | Advance fee fraud | Lottery scam | Make money fast
Misc. Messaging spam | Newsgroup spam | Flyposting
History of spamming
[edit]

References

* Pringle, G., Allison, L., and Dowe, D. (1998). "What is a tall poppy among web pages?". Proceedings of the seventh conference on World Wide Web.
* Brin, Sergey and Page, Lawrence (1998). "The Anatomy of a Large-Scale Hypertextual Web Search Engine". Proceedings of the seventh international conference on World Wide Web 7, 107-117.
* The Clever Project. History. Retrieved on May 4, 2006.
* Google Patent Application - Information Retrieval Based on Historical Data. History. Retrieved on October 10, 2005.
* 'Optimize' Rankings At Your Own Risk by By David Kesmodel at The Wall Street Journal Online. Google. Retrieved on September 9, 2005.
* Legal Showdown in Search Fracas By Adam L. Penenberg at Wired.com. Google. Retrieved on September 8, 2005.
* Confirming a penalty by Matt Cutts at Matt Cutts Blog. Google. Retrieved on February 2, 2006.
* Cho, J., Garcia-Molina, H., and Page, L. (1998). "Efficient crawling through URL ordering". Proceedings of the seventh conference on World Wide Web.
* Ramping up on international webspam by Matt Cutts at MattCutts.com/Blog/. Google. Retrieved on February 4, 2006.

[edit]

External links
[edit]

Additional research resources

* Our Search: Google Technology
* Google: Company Overview

[edit]

Search engines' guidelines

* Google's Guidelines on SEO's
* Google's Guidelines on Site Design
* Yahoo! Search Content Quality Guidelines
* MSN Search Guidelines for successful indexing
* Editorial Guidelines for Ask.com

[edit]

Sources of background information

* Matt Cutts Blog - Matt Cutts is the only Senior Engineer from Google who actively communicates with the SEO/SEM community
* SearchEngineWatch.com - Search Engine News, Resources, Forums, Organizer of SES (Search Engine Strategies) Summit, Daily Search Cast SEO News and Podcast.
* Thread Watch - Open Internet Marketing Community - News, Alerts and Articles. If something happens in SEO/SEM, TW is often the first to report.
* Webmasterworld.com - Search Engine Optimization Forum. WMW is known to be watched by Google and posted to anonymously by a Google employee (Google Guy).
* SES Search Engine Strategies - Foremost Conference and Expo of the SEO Industry

Retrieved from "http://en.wikipedia.org/wiki/Search_engine_optimization"

Categories: Spamming | Internet terminology | Search engine optimization

Google
 
Web WORLDiKEYWORD

KEYWORD IN CONTEXT

WIC
From Wikipedia, the free encyclopedia
(Redirected from Keyword in context)


KWIC is an acronym for Keyword In Context, the most common format for concordance lines.

A KWIC index is formed by sorting and aligning the words within an article title to allow each word (except the stop words) in titles to be searchable alphabetically in the index. It was a useful indexing method for technical manuals before computerized full text search became common. For example, the title statement of this article and the Wikipedia slogan would appear as follows in a KWIC index. A KWIC index usually uses a wide layout to allow the display of maximum 'in context' information (not shown in the following example).
KWIC is an acronym for Keyword In Context, ... page 1
... Keyword In Context, the most common format for concordance lines. page 1
... the most common format for concordance lines. page 1
... is an acronym for Keyword In Context, the most common format ... page 1
Wikipedia, The Free Encyclopedia page 0
... In Context, the most common format for concordance lines. page 1
Wikipedia, The Free Encyclopedia page 0
KWIC is an acronym for Keyword In Context, the most ... page 1
KWIC is an acronym for Keyword ... page 1
... common format for concordance lines. page 1
... for Keyword In Context, the most common format for concordance ... page 1
Wikipedia, The Free Encyclopedia page 0
[edit]

See also

* Burrows-Wheeler transform
* Hans Peter Luhn
* KWOC
* KWAC
* KLIC

Google
 
Web WORLDiKEYWORD

KEYWORD & SQL

KEYWORD & SQL

From Wikipedia, the free encyclopedia
(Redirected from SQL keywords)


SQL (commonly expanded to Structured Query Language — see History for the term's derivation) is the most popular computer language used to create, modify, retrieve and manipulate data from relational database management systems. The language has evolved beyond its original purpose to support object-relational database management systems. It is an ANSI/ISO standard.

SQL is commonly spoken in initialism-style ess-cue-el (see English alphabet) — regarded as more formal — or in a phonetically-amalgamated form that mirrors the English word sequel. Concerning the names of major database products (or projects) containing the letters SQL, each has its own convention: MySQL is commonly pronounced my ess-cue-el; PostgreSQL is expediently pronounced postgres; and Microsoft SQL Server is commonly spoken as Microsoft-sequel-server.
Contents
[hide]

* 1 History
o 1.1 Standardization
* 2 Scope
* 3 SQL keywords
o 3.1 Data retrieval
o 3.2 Data manipulation
o 3.3 Data transaction
o 3.4 Data definition
o 3.5 Data control
o 3.6 Other
* 4 Database systems using SQL
* 5 Criticisms of SQL
* 6 Alternatives to SQL
* 7 References
* 8 External links

[edit]

History

An influential paper, "A Relational Model of Data for Large Shared Data Banks", by Dr. Edgar F. Codd, was published in June, 1970 in the Association for Computing Machinery (ACM) journal, Communications of the ACM, although drafts of it were circulated internally within IBM in 1969. Codd's model became widely accepted as the definitive model for relational database management systems (RDBMS or RDMS).

During the 1970s, a group at IBM's San Jose research center developed a database system "System R" based upon, but not strictly faithful to, Codd's model. Structured English Query Language ("SEQUEL") was designed to manipulate and retrieve data stored in System R. The acronym SEQUEL was later condensed to SQL because the word 'SEQUEL' was held as a trademark by the Hawker-Siddeley aircraft company of the UK. Although SQL was influenced by Codd's work, Donald D. Chamberlin and Raymond F. Boyce at IBM were the authors of the SEQUEL language design.[1]. Their concepts were published to increase interest in SQL.

The first non-commercial, relational, non-SQL database, Ingres, was developed in 1974 at U.C. Berkeley.

In 1978, methodical testing commenced at customer test sites. Demonstrating both the usefulness and practicality of the system, this testing proved to be a success for IBM. As a result, IBM began to develop commercial products based on their System R prototype that implemented SQL, including the System/38 (announced in 1978 and commercially available in August 1979), SQL/DS (introduced in 1981), and DB2 (in 1983).[1]

At the same time Relational Software, Inc. (now Oracle Corporation) saw the potential of the concepts described by Chamberlin and Boyce and developed their own version of a RDBMS for the Navy, CIA and others. In the summer of 1979 Relational Software, Inc. introduced Oracle V2 (Version2) for VAX computers as the first commercially available implementation of SQL. Oracle is often incorrectly cited as beating IBM to market by two years, when in fact they only beat IBM's release of the System/38 by a few weeks. Considerable public interest then developed; soon many other vendors developed versions, and Oracle's future was ensured.


[edit]

Standardization

SQL was adopted as a standard by ANSI (American National Standards Institute) in 1986 and ISO (International Organization for Standardization) in 1987. ANSI has declared that the official pronunciation for SQL is /ɛs kjuː ɛl/, although many English-speaking database professionals still pronounce it as sequel, and with gaining popularity has received the alias 'SQuirreL'.

The SQL standard has gone through a number of revisions:
Year Name Alias Comments
1986 SQL-86 SQL-87 First published by ANSI. Ratified by ISO in 1987.
1989 SQL-89 Minor revision.
1992 SQL-92 SQL2 Major revision.
1999 SQL:1999 SQL3 Added regular expression matching, recursive queries, triggers, non-scalar types and some object-oriented features. (The last two are somewhat controversial and not yet widely supported.)
2003 SQL:2003 Introduced XML-related features, window functions, standardized sequences and columns with auto-generated values (including identity-columns).
[edit]

Scope

Although SQL is defined by both ANSI and ISO, there are many extensions to and variations on the version of the language defined by these standards bodies. Many of these extensions are of a proprietary nature, such as Oracle Corporation's PL/SQL or Sybase, IBM's SQL PL (SQL Procedural Language) and Microsoft's Transact-SQL. It is also somewhat common for commercial implementations to omit support for basic features of the standard, such as the DATE or TIME data types, preferring some variant of their own. As a result, in contrast to ANSI C or ANSI Fortran, which can usually be ported from platform to platform without major structural changes, SQL code can rarely be ported between database systems without major modifications. There are several reasons for this lack of portability between database systems:

* the complexity and size of the SQL standard means that most databases do not implement the entire standard.
* the standard does not specify database behavior in several important areas (e.g. indexes), leaving it up to implementations of the standard to decide how to behave.
* the SQL standard precisely specifies the syntax that a conforming database system must implement. However, the standard's specification of the semantics of language constructs is less well-defined, leading to areas of ambiguity.
* many database vendors have large existing customer bases; where the SQL standard conflicts with the prior behavior of the vendor's database, the vendor may be unwilling to break backward compatibility.
* some believe the lack of compatibility between database systems is intentional in order to ensure vendor lock-in.

SQL is designed for a specific, limited purpose — querying data contained in a relational database. As such, it is a set-based, declarative computer language rather than an imperative language such as C or BASIC which, being programming languages, are designed to solve a much broader set of problems. Language extensions such as PL/SQL are designed to address this by turning SQL into a full-fledged programming language while maintaining the advantages of SQL. Another approach is to allow programming language code to be embedded in and interact with the database. For example, Oracle and others include Java in the database, while PostgreSQL allows functions to be written in a wide variety of languages, including Perl, Tcl, and C.

One joke about SQL is that "SQL is not structured, nor is it limited to queries, nor is it a language." This is founded on the notion that pure SQL is not a classic programming language since it is not Turing-complete. On the other hand, however, it is a programming language because it has a grammar, syntax, and programmatic purpose and intent. The joke recalls Voltaire's remark that the Holy Roman Empire was "not holy, nor Roman, nor an empire."

SQL contrasts with the more powerful database-oriented fourth-generation programming languages such as Focus or SAS in its relative functional simplicity and simpler command set. This greatly reduces the degree of difficulty involved in maintaining SQL source code, but it also makes programming such questions as 'Who had the top ten scores?' more difficult, leading to the development of procedural extensions, discussed above. However, it also makes it possible for SQL source code to be produced (and optimized) by software, leading to the development of a number of natural language database query languages, as well as 'drag and drop' database programming packages with 'object oriented' interfaces. Often these allow the resultant SQL source code to be examined, for educational purposes, further enhancement, or to be used in a different environment.
[edit]

SQL keywords

SQL keywords fall into several groups.
[edit]

Data retrieval

The most frequently used operation in transactional databases is the data retrieval operation. When restricted to data retrieval commands, SQL acts as a functional language

* SELECT is used to retrieve zero or more rows from one or more tables in a database. In most applications, SELECT is the most commonly used Data Manipulation Language command. In specifying a SELECT query, the user specifies a description of the desired result set, but they do not specify what physical operations must be executed to produce that result set. Translating the query into an efficient query plan is left to the database system, more specifically to the query optimizer.
o Commonly available keywords related to SELECT include:
+ FROM is used to indicate from which tables the data is to be taken, as well as how the tables join to each other.
+ WHERE is used to identify which rows to be retrieved, or applied to GROUP BY. WHERE is evaluated before the GROUP BY.
+ GROUP BY is used to combine rows with related values into elements of a smaller set of rows.
+ HAVING is used to identify which of the "combined rows" (combined rows are produced when the query has a GROUP BY keyword or when the SELECT part contains aggregates), are to be retrieved. HAVING acts much like a WHERE, but it operates on the results of the GROUP BY and hence can use aggregate functions.
+ ORDER BY is used to identify which columns are used to sort the resulting data.

Example 1:
SELECT * FROM books
WHERE price > 100.00
ORDER BY title

This example retrieves the records from the books table that have a price field which is greater than 100.00. The result is sorted alphabetically by book title. The asterisk (*) means to show all columns of the books table. Alternatively, specific columns could be named.

Example 2:
SELECT books.title, count(*) AS Authors
FROM books
JOIN book_authors ON books.book_number = book_authors.book_number
GROUP BY books.title

Example 2 shows both the use of multiple tables in a join and aggregation (grouping). This example shows how many authors there are per book. Example output may resemble:

Title Authors
---------------------- -------
SQL Examples and Guide 3
The Joy of SQL 1
How to use Wikipedia 2
Pitfalls of SQL 1
How SQL Saved my Dog 1

[edit]

Data manipulation

First there are the standard Data Manipulation Language (DML) elements. DML is the subset of the language used to add, update and delete data.

* INSERT is used to add zero or more rows (formally tuples) to an existing table.
* UPDATE is used to modify the values of a set of existing table rows.
* MERGE is used to combine the data of multiple tables. It is something of a combination of the INSERT and UPDATE elements. It is defined in the SQL:2003 standard; prior to that, some databases provided similar functionality via different syntax, sometimes called an "upsert".
* TRUNCATE deletes all data from a table (non-standard, but common SQL command).
* DELETE removes zero or more existing rows from a table.

Example:
INSERT INTO my_table (field1, field2, field3) VALUES ('test', 'N', NULL);
UPDATE my_table SET field1 = 'updated value' WHERE field2 = 'N';
DELETE FROM my_table WHERE field2 = 'N';

[edit]

Data transaction

Transaction, if available, can be used to wrap around the DML operations.

* START TRANSACTION (or BEGIN WORK, depending on SQL dialect) can be used to mark the start of a database transaction, which either completes completely or not at all.
* COMMIT causes all data changes in a transaction to be made permanent.
* ROLLBACK causes all data changes since the last COMMIT or ROLLBACK to be discarded, so that the state of the data is "rolled back" to the way it was prior to those changes being requested.

COMMIT and ROLLBACK interact with areas such as transaction control and locking. Strictly, both terminate any open transaction and release any locks held on data. In the absence of a START TRANSACTION or similar statement, the semantics of SQL are implementation-dependent.

Example:
START TRANSACTION;
UPDATE inventory SET quantity = quantity - 3 WHERE item = 'pants';
COMMIT;

[edit]

Data definition

The second group of keywords is the Data Definition Language (DDL). DDL allows the user to define new tables and associated elements. Most commercial SQL databases have proprietary extensions in their DDL, which allow control over nonstandard features of the database system.

The most basic items of DDL are the CREATE and DROP commands.

* CREATE causes an object (a table, for example) to be created within the database.
* DROP causes an existing object within the database to be deleted, usually irretrievably.

Some database systems also have an ALTER command, which permits the user to modify an existing object in various ways -- for example, adding a column to an existing table.

Example:
CREATE TABLE my_table (
my_field1 INT UNSIGNED,
my_field2 VARCHAR (50),
my_field3 DATE NOT NULL,
PRIMARY KEY (my_field1, my_field2)
)

[edit]

Data control

The third group of SQL keywords is the Data Control Language (DCL). DCL handles the authorization aspects of data and permits the user to control who has access to see or manipulate data within the database.

Its two main keywords are:

* GRANT — authorizes one or more users to perform an operation or a set of operations on an object.
* REVOKE — removes or restricts the capability of a user to perform an operation or a set of operations.

Example:
GRANT SELECT, UPDATE ON my_table TO some_user, another_user

[edit]

Other

* ANSI-standard SQL supports -- as a single line comment identifier (some extensions also support curly brackets or C-style /* comments */ for multi-line comments).

Example:
SELECT * FROM inventory -- Retrieve everything from inventory table

* Some SQL servers allow User Defined Functions

[edit]

Database systems using SQL

* List of relational database management systems
* List of object-relational database management systems
* List of hierarchical database management systems

[edit]

Criticisms of SQL

Technically, SQL is a declarative computer language for use with "SQL databases". Theorists and some practitioners note that many of the original SQL features were inspired by, but in violation of, the relational model for database management and its tuple calculus realization. Recent extensions to SQL achieved relational completeness, but have worsened the violations, as documented in The Third Manifesto.

In addition, there are also some criticisms about the practical use of SQL:

* The language syntax is rather complex (sometimes called "COBOL-like").

* It does not provide a standard way, or at least a commonly-supported way, to split large commands into multiple smaller ones that reference each other by name. This tends to result in "run-on SQL sentences" and may force one into a deep hierarchical nesting when a graph-like (reference-by-name) approach may be more appropriate and better repetition-factoring.

* Implementations are inconsistent and, usually, incompatible between vendors.

* For larger statements, it is often difficult to factor repeated patterns and expressions into one or fewer places to avoid repetition and avoid having to make the same change to different places in a given statement.

* The difference between value-to-column assignment in UPDATE and INSERT syntax is puzzling.

* The language cannot easily be extended by programmers or DBA's. Although some variations allow the addition of functions, the functions can only take scalar values and not tables (real or virtual) as arguments. The archaic syntax (above) may be part of the reason for this.

* Makes it too easy to do a Cartesian Join, which results in "run-away" result sets when WHERE clauses are mistyped. Cartesian joins are so rarely used in practice that requiring an explicit CARTESIAN key-word may be warranted.

[edit]

Alternatives to SQL

A distinction should be made between alternatives to relational and alternatives to SQL. The list below are proposed alternatives to SQL, but are still (nominally) relational. See navigational database for alternatives to relational.

* IBM Business System 12 (IBM BS12)
* Tutorial D
* TQL - Luca Cardelli (May not be relational)
* Top's Query Language - A draft language influenced by IBM BS12. Tentatively renamed to SMEQL to avoid confusion with similar projects called TQL.
* Hibernate Query Language[2] (HQL) - A Java-based tool that uses modified SQL
* Quel introduced in 1974 by the U.C. Berkeley Ingres project.
* The Object Data Standard - Object Data Management Group.
* Object Query Language - Object Data Management Group.
* Datalog

[edit]

References

1. ^ Donald D. Chamberlin and Raymond F. Boyce, 1974. "SEQUEL: A structured English query language", International Conference on Management of Data, Proceedings of the 1974 ACM SIGFIDET (now SIGMOD) workshop on Data description, access and control, Ann Arbor, Michigan, pp. 249–264

1. Discussion on alleged SQL flaws (C2 wiki)
2. Web page about FSQL: References and links.
3. Galindo J., Urrutia A., Piattini M., "Fuzzy Databases: Modeling, Design and Implementation". Idea Group Publishing Hershey, USA, 2005.

[edit]

External links

* The 1995 SQL Reunion: People, Projects, and Politics (early history of SQL)
* SQL:2003, SQL/XML and the Future of SQL (webcast and podcast with Jim Melton, editor of the SQL standard)
* TenMinuteTutor guide to SQL
* A Gentle Introduction to SQL at SQLzoo
* SQL Help and Tutorial
* The SQL Language (PostgreSQL specific details included)
* SQL Exercises. SQL DML Help and Tutorial
* SQL Tutorial.
* Kihlman's SQL
* A free SQL cookbook for all SQL dialects
* Online Interactive SQL Tutorials

Wikibooks
Wikibooks Programming has more about this subject:
SQL


Topics in database management systems (DBMS) ( view • talk • edit )

Concepts
Database | Database model | Relational database | Relational model | Relational algebra | Primary key - Foreign key - Surrogate key - Superkey
Database normalization | Referential integrity | Relational DBMS | Distributed DBMS | ACID

Objects
Trigger | View | Table | Cursor | Log | Transaction | Index | Stored procedure | Partition


Topics in SQL
Select | Insert | Update | Merge | Delete | Join | Union | Create | Drop
Comparison of syntax
Implementations of database management systems

Types of implementations
Flat file | Deductive | Dimensional | Hierarchical | Object oriented | Temporal

Products
dBASE | Oracle | Sybase | MySQL | Microsoft SQL Server | PostgreSQL | DB2 | Comparison - relational | Comparison - object-relational


Components
Query language | Query optimizer | Query plan | ODBC | JDBC
Lists
List of object-oriented database management systems
List of relational database management systems
List of truly relational database management systems
Retrieved from "http://en.wikipedia.org/wiki/SQL"

Categories: Declarative programming languages | Programming languages | Query languages | SQL

Google
 
Web WORLDiKEYWORD

WHAT IS KEYWORD ?

Keyword may mean:
( From Wikipedia, the free encyclopedia )

In a non-computer sense: