ICDL crawling
This article may not meet the general notability guideline. Please help to establish notability by adding reliable, secondary sources about the topic. If notability cannot be established, the article is likely to be merged, redirected, or deleted. (May 2008) |
ICDL crawling is an open distributed web crawling technology based on Website Parse Template (WPT).
What is Website Parse Template?
Website Parse Template (WPT) is an XML based open format which provides HTML structure description of website pages. WPT format allows web crawlers to generate Semantic Web’s RDF triplets for web pages. WPT is compatible with existing Semantic Web concepts defined by W3C (RDF and OWL) and UNL specifications.
Distributed ICDL crawling
ICDL crawling involves parsing of websites’ content considering HTML structure templates represented in WPT files.
Distributed crawling is carried out by open source client/server application installed on volunteers’ personal computers. After authentication procedures, application registers each PC as a Distributed Crawling node. Crawler periodically receives tasks from management console to download specified websites, parse their content and submit the results into Parsed Content Storage. Crawling processes are activated when user’s computer is in idle and Internet connection is not in use.
Internet content parse results from several Crawlers are compared by management console to increase crawling results' accuracy grade. Crawling results can be stored to be used by thematic and general search engines with different search algorithms, such as Google, Live, Yahoo!, Froogle, etc. to perform more accurate web search.
See also
External links
If you like SEOmastering Site, you can support it by - BTC: bc1qppjcl3c2cyjazy6lepmrv3fh6ke9mxs7zpfky0 , TRC20 and more...