Review of programs for searching documents and data. Software and services for professional search Internet resources for searching professional information

For professional Internet searches you need specialized software, as well as specialized search engines and search services.

PROGRAMS

http://dr-watson.wix.com/home – the program is designed for exploring arrays text information in order to identify entities and connections between them. The result of the work is a report on the object under study.

http://www.fmsasg.com/ - one of the best programs in the world for visualizing connections and relationships Sentinel Vizualizer. The company has completely Russified its products and connected hotline in Russian.

http://www.newprosoft.com/ – “Web Content Extractor” is the most powerful, easy-to-use software for extracting data from web sites. It also has an effective Visual Web spider.

SiteSputnik a software package that has no analogues in the world, allowing you to search and process its results on the Visible and Invisible Internet, using all the search engines necessary for the user.

WebSite-Watcher – allows you to monitor web pages, including password-protected ones, monitoring forums, RSS feeds, news groups, local files. Has a powerful filter system. Monitoring is carried out automatically and is delivered in a user-friendly form. A program with advanced functions costs 50 euros. Constantly updated.

http://www.scribd.com/ is the most popular platform in the world and increasingly used in Russia for posting various kinds of documents, books, etc. for free access with a very convenient search engine for titles, topics, etc.

http://www.atlasti.com/ is the most powerful and effective tool for qualitative information analysis available to individual users, small and even medium-sized businesses. The program is multifunctional and therefore useful. It combines the ability to create a unified information environment for working with various text, tabular, audio and video files as a single whole, as well as tools for qualitative analysis and visualization.

Ashampoo ClipFinder HD – an ever-increasing share of the information flow comes from video. Accordingly, competitive intelligence officers need tools that allow them to work with this format. One such product is the free utility we present. It allows you to search for videos based on specified criteria on video file storage sites such as YouTube. The program is easy to use, displays all search results on one page with detailed information, titles, duration, time when the video was uploaded to the storage, etc. There is a Russian interface.

http://www.advego.ru/plagiatus/ – the program is made SEO optimizers, but is quite suitable as an Internet intelligence tool. Plagiarism shows the degree of uniqueness of the text, the sources of the text, and the percentage of text match. The program also checks the uniqueness of the specified URL. The program is free.

http://neiron.ru/toolbar/ – includes an add-on for combining Google search and Yandex, and also allows for competitive analysis based on assessing the effectiveness of sites and contextual advertising. Implemented as a plugin for FF and GC.

http://web-data-extractor.net/ is a universal solution for obtaining any data available on the Internet. Setting up data cutting from any page is done in a few mouse clicks. You just need to select the data area that you want to save and Datacol will automatically select a formula for cutting out this block.

CaptureSaver is a professional Internet research tool. Simply irreplaceable working programm, allowing you to capture, store and export any Internet information, including not only web pages, blogs, but also RSS news, email, images and much more. It has the widest functionality, an intuitive interface and a ridiculous price.

http://www.orbiscope.net/en/software.html – web monitoring system at more than affordable prices.

http://www.kbcrawl.co.uk/ – software for work, including on the “Invisible Internet”.

http://www.copernic.com/en/products/agent/index.html – the program allows you to search using more than 90 search engines, more than 10 parameters. Allows you to combine results, eliminate duplicates, block broken links, and show the most relevant results. Comes in free, personal and professional versions. Used by more than 20 million users.

Maltego is a fundamentally new software that allows you to establish the relationship of subjects, events and objects in real life and on the Internet.

SERVICES

new – web browser with dozens of pre-installed tools for OSINT.

– an effective search engine-aggregator for finding people on major Russian social networks.

https://hunter.io/ is an effective service for detecting and checking email.

https://www.whatruns.com/ is an easy to use yet effective scanner to discover what is working and not working on a website and what its security holes are. Also implemented as a plugin for Chrom.

https://www.crayon.co/ is an American budget platform for market and competitive intelligence on the Internet.

http://www.cs.cornell.edu/~bwong/octant/ – host identifier.

https://iplogger.ru/ – a simple and convenient service for determining someone else’s IP.

http://linkurio.us/ is a powerful new product for economic security workers and corruption investigators. Processes and visualizes huge amounts of unstructured information from financial sources.

http://www.intelsuite.com/en – English-language online platform for competitive intelligence and monitoring.

http://yewno.com/about/ is the first operating system for translating information into knowledge and visualizing unstructured information. Currently supports English, French, German, Spanish and Portuguese.

https://start.avalancheonline.ru/landing/?next=%2F – forecasting and analytical services by Andrey Masalovich.

https://www.outwit.com/products/hub/ – a complete set of stand-alone programs for professional work in web 1.

https://github.com/search?q=user%3Acmlh+maltego – extensions for Maltego.

http://www.whoishostingthis.com/ – search engine for hosting, IP addresses, etc.

http://appfollow.ru/ – analysis of applications based on reviews, ASO optimization, positions in tops and search results for the App Store, Google Play and Windows Phone Store.

http://spiraldb.com/ is a service implemented as a plugin for Chrom, which allows you to get a lot of valuable information about any electronic resource.

https://millie.northernlight.com/dashboard.php?id=93 - free service, collecting and structuring key information by industry and company. It is possible to use information panels based on text analysis.

http://byratino.info/ – collection of factual data from publicly available sources on the Internet.

http://www.datafox.co/ – CI platform collects and analyzes information on companies of interest to clients. There is a demo.

https://unwiredlabs.com/home - a specialized application with an API for searching by geolocation of any device connected to the Internet.

http://visualping.io/ – a service for monitoring sites and, first of all, the photographs and images available on them. Even if the photo only appears for a second, it will be in the subscriber's email. Has a plugin for Google Chrome.

http://spyonweb.com/ is a research tool that allows for in-depth analysis of any Internet resource.

http://bigvisor.ru/ – the service allows you to track advertising campaigns for certain segments of goods and services, or specific organizations.

http://www.itsec.pro/2013/09/microsoft-word.html – instructions for use by Artem Ageev Windows programs for competitive intelligence needs.

http://granoproject.org/ is an open source tool source code for researchers who track networks of connections between individuals and organizations in politics, economics, crime, etc. Allows you to connect, analyze and visualize information obtained from various sources, as well as show significant connections.

http://imgops.com/ – service for extracting metadata from graphic files and working with them.

http://sergeybelove.ru/tools/one-button-scan/ – a small online scanner for checking security holes in websites and other resources.

http://isce-library.net/epi.aspx – service for searching primary sources using a fragment of text on English language

https://www.rivaliq.com/ is an effective tool for conducting competitive intelligence in Western, primarily European and American markets for goods and services.

http://watchthatpage.com/ is a service that allows you to automatically collect new information from monitored Internet resources. The service is free.

http://falcon.io/ is a kind of Rapportive for the Web. It is not a replacement for Rapportive, but provides additional tools. In contrast, Rapportive provides a general profile of a person, as if glued together from data from social networks and mentions on the web. http://watchthatpage.com/ - a service that allows you to automatically collect new information from monitored resources on the Internet. The service is free.

https://addons.mozilla.org/ru/firefox/addon/update-scanner/ – add-on for Firefox. Monitors web page updates. Useful for websites that do not have news feeds (Atom or RSS).

http://agregator.pro/ – aggregator of news and media portals. Used by marketers, analysts, etc. to analyze news flows on certain topics.

http://price.apishops.com/ – automated web service for monitoring prices for selected product groups, specific online stores and other parameters.

http://www.la0.ru/ is a convenient and relevant service for analyzing links and backlinks to an Internet resource.

www.recordedfuture.com is a powerful tool for data analysis and visualization, implemented as an online service built on cloud computing.

http://advse.ru/ is a service with the slogan “Find out everything about your competitors.” Allows you to obtain competitors' websites in accordance with search queries and analyze competitors' advertising campaigns in Google and Yandex.

http://spyonweb.com/ – the service allows you to identify sites with the same characteristics, including those using the same Google Analytics statistics service identifiers, IP addresses, etc.

http://www.connotate.com/solutions – a line of products for competitive intelligence, managing information flows and converting information into information assets. It includes both complex platforms and simple, cheap services that allow for effective monitoring along with information compression and obtaining only the necessary results.

http://www.clearci.com/ – competitive intelligence platform for business various sizes from startups and small companies to Fortune 500 companies. Solved as saas.

http://startingpage.com/ is a Google add-on that allows you to search on Google without recording your IP address. Fully supports all search engines Google features, including in Russian.

http://newspapermap.com/ – unique service, very useful for a competitive scout. Connects geolocation with an online media search engine. Those. you select the region you are interested in, or even a city, or language, see the place on the map and a list of online versions of newspapers and magazines, click on the appropriate button and read. Supports Russian language, very user-friendly interface.

http://infostream.com.ua/ is a very convenient news monitoring system “Infostream”, distinguished by a first-class selection and quite accessible to any wallet, from one of the classics of Internet search, D.V. Lande.

http://www.instapaper.com/ is a very simple and effective tool for saving the necessary web pages. Can be used on computers, iPhones, iPads, etc.

http://screen-scraper.com/ – allows you to automatically extract all information from web pages, download the vast majority of file formats, and automatically enter data into various forms. It saves downloaded files and pages in databases and performs many other extremely useful functions. Works on all major platforms, has fully functional free and very powerful professional versions.

http://www.mozenda.com/ - having several tariff plans and a web service of multifunctional web monitoring and delivery of information necessary for the user from selected sites, available even to small businesses.

http://www.recipdonor.com/ - the service allows you to automatically monitor everything that happens on competitors' websites.

http://www.spyfu.com/ – and this is if your competitors are foreign.

www.webground.su is a service for monitoring the Runet created by Internet search professionals, which includes all the major providers of information, news, etc., and is capable of individual monitoring settings to suit the user’s needs.

SEARCH ENGINES

https://www.idmarch.org/ is the best search engine for the world archive of pdf documents in terms of quality. Currently, more than 18 million pdf documents have been indexed, ranging from books to secret reports.

http://www.marketvisual.com/ is a unique search engine that allows you to search for owners and top management by full name, company name, position, or a combination thereof. The search results contain not only the objects you are looking for, but also their connections. Designed primarily for English-speaking countries.

http://worldc.am/ is a search engine for freely accessible photographs linked to geolocation.

https://app.echosec.net/ is a public search engine that describes itself as the most advanced analytical tool for law enforcement and security and intelligence professionals. Allows you to search for photos posted on various sites, social platforms and social networks in relation to specific geolocation coordinates. There are currently seven data sources connected. By the end of the year their number will be more than 450. Thanks to Dementy for the tip.

http://www.quandl.com/ is a search engine for seven million financial, economic and social databases.

http://bitzakaz.ru/ – search engine for tenders and government orders with additional paid functions

Website-Finder - makes it possible to find sites that Google does not index well. The only limitation is that it only searches 30 websites for each keyword. The program is easy to use.

http://www.dtsearch.com/ is a powerful search engine that allows you to process terabytes of text. Works on desktop, web and intranet. Supports both static and dynamic data. Allows you to search in all MS Office programs. The search is carried out using phrases, words, tags, indexes and much more. The only federated search engine available. It has both paid and free versions.

http://www.strategator.com/ – searches, filters and aggregates information about the company from tens of thousands of web sources. Searches in the USA, Great Britain, major EEC countries. It is highly relevant, user-friendly, and has free and paid options ($14 per month).

http://www.shodanhq.com/ is an unusual search engine. Immediately after his appearance, he received the nickname “Google for hackers.” It does not search for pages, but determines IP addresses, types of routers, computers, servers and workstations located at a particular address, and traces chains DNS servers and allows you to implement many others interesting features for competitive intelligence.

http://search.usa.gov/ – search engine for websites and open databases all US government agencies. The databases contain a lot of practical, useful information, including for use in our country.

http://visual.ly/ – today visualization is increasingly used to present data. This is the first infographic search engine on the Web. Along with the search engine, the portal has powerful data visualization tools that do not require programming skills.

http://go.mail.ru/realtime – search for discussions of topics, events, objects, subjects in real or customizable time. The previously highly criticized search in Mail.ru works very effectively and provides interesting, relevant results.

Zanran is just launched, but already working great, the first and only search engine for data that extracts it from PDF files, EXCEL tables, data on HTML pages.

http://www.ciradar.com/Competitive-Analysis.aspx is one of the world's best information retrieval systems for competitive intelligence on the deep web. Retrieves almost all types of files in all formats on the topic of interest. Implemented as a web service. The prices are more than reasonable.

http://public.ru/ – Effective search and professional analysis of information, media archive since 1990. The online media library offers a wide range information services: from access to electronic archives of publications of Russian-language media and ready-made thematic press reviews to individual monitoring and exclusive analytical research carried out based on press materials.

Cluuz is a young search engine with ample opportunities for competitive intelligence, especially on the English-language Internet. Allows you not only to find, but also to visualize and establish connections between people, companies, domains, e-mails, addresses, etc.

www.wolframalpha.com – the search engine of tomorrow. In response to a search request, it provides statistical and factual information available on the request object, including visualized information.

www.ist-budget.ru – universal search in databases of government procurement, tenders, auctions, etc.

Alexey Kutovenko

Professional Internet Search

Introduction

Internet search is an important element of working on the Internet. Hardly anyone knows for sure the exact number of web resources on the modern Internet. In any case, the count is in the billions. In order to be able to use the information needed at a given moment, no matter for work or entertainment purposes, you first need to find it in this constantly replenished ocean of resources. This is not an easy task at all, since information on the modern Internet is not structured, which creates problems in finding it. It is no coincidence that the peculiar “windows” into this information space Internet search engines have become

It is unlikely that among Internet users there will be people who have never used large universal search engines. The names Google, Yandex and a couple of other big machines are on everyone’s lips. They cope remarkably well with everyday Internet search tasks, and often users don’t even try to look for a replacement. At the same time, the number of Internet search engines in our time amounts to thousands. The reasons for such a variety of alternative machines have different roots. Some projects are trying to directly compete with global market leaders through careful work with national Internet resources. Others offer query capabilities not available from well-known search engines. A significant number of alternative engines specialize in searching for a certain topic area or a certain type of content, achieving impressive results in solving these problems. Be that as it may, the inclusion of such search engines in a user's own arsenal of Internet search tools can significantly improve its quality. However, there is one nuance here: you need to know about such machines and be able to use their capabilities.

We assume that readers of this book are already quite familiar with search techniques using universal search engines. It was so good that they felt the limitations associated with their use. Most likely, such people have already tried to look for and use certain additional tools. The printed word does not ignore the topic of Internet search: articles appear periodically and books are published. But their heroes, as a rule, are the same - several leading universal search engines. What makes this book different is that it attempts to cover the full range of modern search solutions. Here you will find descriptions and recommendations for using the best modern services aimed at solving the most common search problems. This book is for people who work a lot on the Internet and use the Web to search. necessary information– be it business, study or hobby.

In order for an Internet search to be successful, two conditions must be met: queries must be well formulated and they must be asked in appropriate places. In other words, the user is required, on the one hand, to be able to translate his search interests into the language of the search query, and on the other hand, a good knowledge of search engines, available search tools, their advantages and disadvantages, which will allow him to choose the most suitable search tools in each specific case .

Currently, there is no single resource that satisfies all Internet search requirements. Therefore, if you take your search seriously, you inevitably have to use different tools, using each in the most appropriate case.

There are many search tools available. They can be combined into several groups, each of which has certain advantages and disadvantages. The chapters of our book are devoted to the main groups of modern Internet search engines.

Chapter 1, “Universal Internet Search Engines,” is devoted to large universal systems for retrieving information on the Web. The main focus is on their most advanced instruments, which usually fall under the radar of the general public. A review of the capabilities of known machines gives us a kind of starting point and allows us to clearly imagine the scope of application of alternative search solutions.

Chapter 2, “Vertical Search,” talks about systems that specialize in specific topic areas or specific types of content.

Chapter 3, “Metasearch,” examines metasearch engines that can send a query simultaneously to several Internet search engines, and then collect and process the results in a single interface.

Chapter 4, “Semantic and Visual Internet Search Engines,” provides an overview of experimental systems that offer original user interfaces as well as interesting approaches to query processing.

Chapter 5, “Recommendation Machines,” introduces recently emerging search services, in English aptly named “Discovery Engines”, that is, “discovery machines”. With their help, you can process a number of queries that are too tough for other types of Internet search engines.

If no ready-made product suits you, you can create your own Internet search engine. Chapter 6, “Personal Search Engines,” is devoted to the creation of such personal machines.

Several chapters of our book are devoted to searching various types network content. Chapter 7, “Image Retrieval,” introduces current trends in Internet image retrieval as well as the capabilities of related experimental systems. Chapter 8, “Video Search,” offers an overview of the video search tools of the leading universal Internet search engines, as well as the best specialized systems in this area.

Chapter 9, “Finding “Hidden” Content,” is an overview of systems that allow you to search for content that is “not seen” by universal search engines. Such “hidden” content includes, for example, torrents or files hosted on FTP servers and file hosting sites.

Chapter 10, “Search for Web 3.0,” introduces Internet search tools for data in Semantic Web formats.

The search does not end with simply receiving results from one or another search engine. The last chapter of our book, Chapter 11, “Helper Programs,” is devoted to tools for processing and saving results.

Before starting a story about specific products, it makes sense to understand the classification of modern Internet search tools, as well as define the terms that are constantly found on the pages of our book.

The main Internet search tools can be divided into the following main groups:

Search engines;

Web directories;

Help Resources;

Local programs for searching the Internet.

The most popular search tools are search engines - the so-called Internet search engines (Search Engines). The top three leaders on a global scale are quite stable - Google, Yahoo! and Bing. In many countries, their own local search engines, optimized for working with local content, are added to this list. With their help, you can theoretically find any specific word on the pages of many millions of sites.

Despite many differences, all Internet search engines operate on similar principles and, from a technical point of view, consist of similar subsystems.

The first structural part of the search engine is special programs, used for automatic search and subsequent indexing of web pages. Such programs are usually called spiders, or bots. They look at the code of web pages, find links located on them, and thereby discover new web pages. There is an alternative way to include a site in the index. Many search engines offer resource owners the opportunity to independently add a site to their database. However, the web pages are then downloaded, analyzed and indexed. They highlight structural elements, find keywords, and determine their connections with other sites and web pages. Other operations are also performed, the result of which is the formation of a search engine index database. This base is the second main element any search engine. Currently, there is no single absolutely complete index database that would contain information about all Internet content. Since different search engines use different web page search programs and build their index using different algorithms, search engine index databases can vary significantly. Some sites are indexed by several search engines, but there is always a certain percentage of resources included in the database of only one search engine. The presence of such an original and non-overlapping part of the index in each search engine allows us to draw an important practical conclusion: if you use only one search engine, even the largest one, you will definitely lose a certain percentage of useful links.

Finding the necessary and relevant information on the Internet is sometimes very difficult. The amount of information garbage on the Internet is growing like a snowball, and sometimes it is simply impossible to get to the data that you really need using traditional Yandex and Google. The book you are holding in your hands will increase the efficiency of your search for information on the Internet many times over. It describes techniques, search sites and programs for specialized information retrieval. Modern types of Internet search are considered: universal search, vertical search, metasearch systems, construction of personal search engines, search for audiovisual content, search by hidden internet. For all the systems considered, their characteristics and tips for maximum effective use are given.

Introduction

Internet search is an important element of working on the Internet. Hardly anyone knows for sure the exact number of web resources on the modern Internet. In any case, the count is in the billions. In order to be able to use the information needed at a given moment, no matter for work or entertainment purposes, you first need to find it in this constantly replenished ocean of resources. This is not an easy task at all, since information on the modern Internet is not structured, which creates problems in finding it. It is no coincidence that Internet search engines have become unique “windows” into this information space.

It is unlikely that among Internet users there will be people who have never used large universal search engines. The names Google, Yandex and a couple of other big machines are on everyone’s lips. They cope remarkably well with everyday Internet search tasks, and often users don’t even try to look for a replacement. At the same time, the number of Internet search engines in our time amounts to thousands. The reasons for such a variety of alternative machines have different roots. Some projects are trying to directly compete with global market leaders through careful work with national Internet resources. Others offer query capabilities not available from well-known search engines. A significant number of alternative engines specialize in searching for a certain topic area or a certain type of content, achieving impressive results in solving these problems. Be that as it may, the inclusion of such search engines in a user's own arsenal of Internet search tools can significantly improve its quality. However, there is one nuance here: you need to know about such machines and be able to use their capabilities.

We assume that readers of this book are already quite familiar with search techniques using universal search engines. It was so good that they felt the limitations associated with their use. Most likely, such people have already tried to look for and use certain additional tools. The printed word does not ignore the topic of Internet search: articles appear periodically and books are published. But their heroes, as a rule, are the same - several leading universal search engines. What makes this book different is that it attempts to cover the full range of modern search solutions. Here you will find descriptions and recommendations for using the best modern services aimed at solving the most common search problems. This book is for people who work a lot on the Internet and use the Network to find the information they need - be it business, study or hobby.

In order for an Internet search to be successful, two conditions must be met: queries must be well formulated and they must be asked in appropriate places. In other words, the user is required, on the one hand, to be able to translate his search interests into the language of the search query, and on the other hand, a good knowledge of search engines, available search tools, their advantages and disadvantages, which will allow him to choose the most suitable search tools in each specific case .

Currently, there is no single resource that satisfies all Internet search requirements. Therefore, if you take your search seriously, you inevitably have to use different tools, using each in the most appropriate case.

Chapter 1

Universal Internet search engines

Universal Internet search engines are the main and most famous means of Internet search. Such search engines provide maximum coverage of various resources. The largest and most popular search engines belong to the universal type. These are truly powerful solutions with a lot of features and tools that many users are often unaware of. Understanding the features and capabilities of universal search allows you to learn the strengths and weak sides such systems and consciously choose the most effective search tools.

The market for universal search engines is quite large. In this chapter, we will consider only the most powerful machines that can adequately work with queries in Russian. The chapter opens with stories about the leaders of Russian search - the Google.ru and Yandex systems. Books and a lot of articles have been written about each of these search engines. We will focus on the main features that matter to the end user and also try to identify their strengths.

They are accompanied by a new search development from Microsoft Corporation - the Bing system, which has so far been noticeably deprived of attention, as well as a useful and quite powerful search engine Exalead, the advantage of which is good support search in European Internet resources. This system– is still a rare guest in the search arsenal of our users, so it is considered in more detail than the others.

In this chapter, when reviewing Google systems and Yandex, we will focus only on web search capabilities, and search in specialized databases of these projects is discussed in the following chapters on image and video search. For other universal search engines, information about multimedia search is provided immediately upon introduction to them.

Since three of the four heroes of this chapter are of foreign origin, we immediately note that we are analyzing the capabilities of only their Russian versions. The fact is that some functions of foreign systems, especially experimental ones, are often available only in the original, usually English-language versions of the services.

Google

The Google search engine is deservedly considered the world leader in modern Internet search. Founded in 1998, Google remains one of the leading trendsetters in the field of Internet search and web services.

Google developers have always been distinguished by their increased attention to improving the algorithms of their search engine, as well as reasonable conservatism in the field of user interface. The capabilities of composing a query on Google can be called classic, and the methods of displaying search results have also become a kind of standard. Recently, Google developers have made serious changes in these areas - the largest search engine has begun to look too old-fashioned compared to its young competitors.

Google has one of the world's largest index databases, which provides a wide range of information sources. Google index information is consolidated into several vertical databases. In addition to the most well-known “Web” database, there are several multimedia databases (“Pictures”, “Video”) that work with sources of current information and messages on RSS feeds, the “News” database, as well as the “Blogs” database that indexes online diaries. In addition, Google offers a wide selection of additional resources, among which it is worth noting a mapping service, a website directory, and a question and answer service. These resources can also be thought of as search tools.

In the “Web” database, Google offers simple and advanced search modes for composing a query. In simple search mode from additional tools Only the virtual keyboard is available. Advanced search offers more options. Since the advanced search form is available in almost all Google search products, let’s look at it in more detail (Fig. 1.1).

Yandex

Officially presented to the general public in 1997, the Yandex search engine successfully developed and ten years later became one of the ten largest search engines in the world for the first time. In the Russian segment of the Internet, he has achieved a leading position, which he has no plans to relinquish yet, despite growing competition. The distinctive features of Yandex since the beginning of its existence have been its own original algorithms for determining the relevance of search results, flexible tools for working with query text, and taking into account the peculiarities of the morphology of the Russian language when processing them.

Yandex relies on its own index databases. In addition to searching web documents, the system offers a good selection of specialized resources and additional services. Yandex currently works with images, videos, news, blogs and dictionaries. Powerful search capabilities are also included in our own map service and product search system. In addition, Yandex maintains its own website directory. Yandex's strength is its developed local search program, which is especially important for our users. Yandex provides third-party developers with access to its databases. As a result, many Russian alternative Internet search projects use Yandex resources in one way or another. In addition to the regular search system, a shortened version of Yandex is also offered, available at ya.ru. The interface of this version consists only of a query input field and a search button.

Web Document Search offers simple and advanced search modes. A simple search does not provide any filters, which is compensated by the ability to automatically parse queries in natural language, confident processing of relatively long queries, as well as a system for automatic query completion. The maximum length of a request is forty words.

The advanced search form only offers one field for making a request. Logical operators, connecting the query words, it is proposed to enter manually, fortunately. Yandex has a fairly detailed query language. The remaining tools of the advanced search form are various filters (1.4).

Bing

The history of Internet search from Microsoft cannot be called simple. Algorithms, databases used and, of course, names have repeatedly changed on services consistently offered to the public. Until the early 2000s, the search engine did not have its own databases and worked with external indexes from AltaVista, Inktomi and Looksmart. The original name MSN Search was used until 2006, and then changing search engine names became a Microsoft tradition for several years.

Along with the final transition to searching in its own indexes, MSN Search was first renamed Windows LiveLive Search. Finally, in the early summer of 2009, Live Search was replaced by a new search project, Bing.

“Bing will allow you to take a different look at searching for information on the Internet and help users make important decisions,” was the beginning of Microsoft’s press release on the launch of Bing. The developers’ aspirations were clear: search engines from Microsoft, despite all their efforts, in the West were consistently inferior in popularity to the leaders - Google and Yahoo!. If we talk about the Russian-language versions of previous Microsoft search projects, then in terms of the quantity and quality of links found, they were much inferior to large Russian search engines. In an attempt to catch up with competitors, Bing's developers relied on improving search quality and introducing new technologies, many of which were acquired along with the companies that created them.

It should be noted that the Russian-language version of Bing, like most other localized versions, lacks a number of additional functions, such as shopping search. Since they, in fact, work only in the North. America, there is no point in dwelling on them in detail.

Exalead

One of the features of Europe, including in the field of Internet search, is the large number of national languages. A search engine that claims to be the leading one in Europe simply must index national segments of the Internet well and efficiently process queries in numerous European languages ​​- both the largest and less common. It is in this area that European development can receive serious attention. competitive advantage compared to powerful overseas competitors. The Exalead system is currently seriously vying for the role of such a European search engine. This project was developed within the framework of the Quaere research program funded by the European Union.

Exalead has its own index databases. The main search resources of the system are databases of web documents, images, videos and news. start page Exalead offers customization options. On this page you can place links to your favorite sites - they will be displayed in the form of graphic miniature screenshots. However, to do this you will have to register an account for free, and also allow your browser to store Exalead cookies.

Exalead Web Search offers simple and advanced search modes. The advanced search form, like in Bing, opens directly on the search results page. Note that Exalead offers not just a familiar form with a set of additional fields, but a complex drop-down menu that plays the role of a wizard for refining a query (Fig. 1.7). When you select one or another item in the wizard menu, new elements are added to the query string, and, if necessary, operators and special characters.

What is this

DuckDuckGo is a fairly well-known open source search engine. Servers are located in the USA. In addition to its own robot, the search engine uses results from other sources: Yahoo, Bing, Wikipedia.

The better

DuckDuckGo positions itself as a search engine that provides maximum privacy and confidentiality. The system does not collect any data about the user, does not store logs (no search history), use cookies as limited as possible.

DuckDuckGo does not collect or share personal information from users. This is our privacy policy.

Gabriel Weinberg, founder of DuckDuckGo

Why do you need this

All major search engines are trying to personalize search results based on data about the person in front of the monitor. This phenomenon is called the “filter bubble”: the user sees only those results that are consistent with his preferences or that the system deems as such.

Forms an objective picture that does not depend on your past behavior on the Internet, and eliminates thematic Google advertising and Yandex, based on your requests. With DuckDuckGo it's easy to search for information on foreign languages, while Google and Yandex by default give preference to Russian-language sites, even if the request is entered in another language.


What is this

not Evil is a system that searches the anonymous Tor network. To use it, you need to go to this network, for example by launching a specialized .

not Evil is not the only search engine of its kind. There is LOOK (the default search in the Tor browser, accessible from the regular Internet) or TORCH (one of the oldest search engines on the Tor network) and others. We settled on not Evil because of the clear hint from Google (just look at the start page).

The better

It searches where Google, Yandex and other search engines are generally closed.

Why do you need this

The Tor network contains many resources that cannot be found on the law-abiding Internet. And their number will grow as government control over the content of the Internet tightens. Tor is a kind of network within the Internet with its own social networks, torrent trackers, media, trading platforms, blogs, libraries, and so on.

3. YaCy

What is this

YaCy is a decentralized search engine that works on the principle of P2P networks. Each computer on which the main software module is installed scans the Internet independently, that is, it is analogous to a search robot. The results obtained are collected into a common database that is used by all YaCy participants.

The better

It’s difficult to say whether this is better or worse, since YaCy is a completely different approach to organizing search. The absence of a single server and owner company makes the results completely independent of anyone's preferences. The autonomy of each node eliminates censorship. YaCy is capable of searching the deep web and non-indexed public networks.

Why do you need this

If you are a supporter of open source software and a free Internet, not subject to the influence of government agencies and large corporations, then YaCy is your choice. It can also be used to organize a search within a corporate or other autonomous network. And even though YaCy is not very useful in everyday life, it is a worthy alternative to Google in terms of the search process.

4. Pipl

What is this

Pipl is a system designed to search for information about a specific person.

The better

The authors of Pipl claim that their specialized algorithms search more efficiently than “regular” search engines. In particular, priority is given to social network profiles, comments, member lists, and various databases that publish information about people, such as databases of court decisions. Pipl's leadership in this area is confirmed by assessments from Lifehacker.com, TechCrunch and other publications.

Why do you need this

If you need to find information about a person living in the US, then Pipl will be much more effective than Google. The databases of Russian courts are apparently inaccessible to the search engine. Therefore, he does not cope so well with Russian citizens.

What is this

FindSounds is another specialized search engine. Searches various sounds in open sources: house, nature, cars, people, and so on. The service does not support queries in Russian, but there is an impressive list of Russian-language tags that you can use to search.

The better

The output contains only sounds and nothing extra. In the settings you can set the desired format and sound quality. All sounds found are available for download. There is a search by pattern.

Why do you need this

If you need to quickly find the sound of a musket shot, the blows of a suckling woodpecker, or the cry of Homer Simpson, then this service is for you. And we chose this only from the available Russian-language queries. In English the spectrum is even wider.

Seriously, a specialized service requires a specialized audience. But what if it comes in handy for you too?

What is this

Wolfram|Alpha is a computational search engine. Instead of links to articles containing keywords, it provides a ready-made answer to the user's request. For example, if you enter “compare the populations of New York and San Francisco” into the search form in English, Wolfram|Alpha will immediately display tables and graphs with the comparison.

The better

This service is better than others for finding facts and calculating data. Wolfram|Alpha collects and organizes knowledge available on the Web from a variety of fields, including science, culture and entertainment. If this database contains a ready-made answer to a search query, the system displays it; if not, it calculates and displays the result. In this case, the user sees only nothing superfluous.

Why do you need this

If you're a student, analyst, journalist, or researcher, for example, you can use Wolfram|Alpha to find and calculate data related to your work. The service does not understand all requests, but it is constantly developing and becoming smarter.

What is this

The Dogpile metasearch engine displays a combined list of results from search results from Google, Yahoo and other popular systems.

The better

First, Dogpile displays fewer ads. Secondly, the service uses a special algorithm to find and show top scores from different search engines. According to the Dogpile developers, their systems generate the most complete search results on the entire Internet.

Why do you need this

If you can't find information on Google or another standard search engine, look for it in several search engines at once using Dogpile.

What is this

BoardReader is a system for text search in forums, question and answer services and other communities.

The better

The service allows you to narrow your search field to social platforms. Thanks to special filters, you can quickly find posts and comments that match your criteria: language, publication date and site name.

Why do you need this

BoardReader can be useful for PR specialists and other media specialists who are interested in the opinion of the masses on certain issues.

Finally

The life of alternative search engines is often fleeting. Lifehacker asked the former general director of the Ukrainian branch of Yandex, Sergei Petrenko, about the long-term prospects of such projects.


Sergey Petrenko

Former General Director of Yandex.Ukraine.

As for the fate of alternative search engines, it is simple: to be very niche projects with a small audience, therefore without clear commercial prospects or, conversely, with complete clarity of their absence.

If you look at the examples in the article, you can see that such search engines either specialize in a narrow but popular niche, which, perhaps, has not yet grown enough to be noticeable on the radars of Google or Yandex, or they are testing an original hypothesis in ranking, which is not yet applicable in regular search.

For example, if a search on Tor suddenly turns out to be in demand, that is, results from there are needed by at least a percentage of Google’s audience, then, of course, ordinary search engines will begin to solve the problem of how to find them and show them to the user. If the behavior of the audience shows that for a significant proportion of users in a significant number of queries, results given without taking into account factors depending on the user seem more relevant, then Yandex or Google will begin to produce such results.

“Be better” in the context of this article does not mean “be better at everything.” Yes, in many aspects our heroes are far from Yandex (even far from Bing). But each of these services gives the user something that the search industry giants cannot offer. Surely you also know similar projects. Share with us - let's discuss.

By mid-2015, the global Internet had already connected 3.2 billion users, that is, almost 43.8% of the planet’s population. For comparison: 15 years ago, only 6.5% of the population were Internet users, that is, the number of users has increased more than 6 times! But what is more impressive is not the quantitative, but the qualitative indicators of the expansion of the implementation of Internet technologies in various areas of human activity: from global communications of social networks to household Internet things. Mobile Internet provided users with the opportunity to be online outside the office and at home: on the road, outside the city in nature.
Currently, there are hundreds of systems for searching information on the Internet. The most popular of them are available to the vast majority of users because they are free and easy to use: Google, Yandex, Nigma, Yahoo!, Bing..... For more experienced users, “advanced search” interfaces, specialized searches “by social networks", according to news flows and purchase and sale advertisements... But all these wonderful search engines have a significant drawback, which I already noted above as an advantage: they are free.
If investors invest billions of dollars in the development of search engines, then a completely appropriate question arises: where do they make money?
And they make money, in particular, by providing in response to user requests not so much information that would be useful from the user’s point of view, but that which the owners of search engines consider useful for the user. This is done by manipulating the order in which lists of answers are issued to search queries users. Here is open advertising of certain Internet resources, and hidden manipulation of the relevance of answers based on the commercial, political and ideological interests of the owners of search engines.
Therefore, among professional specialists in searching for information on the Internet, the problem of pertinence of search engine results is very relevant.
Pertinence is the correspondence of documents found by an information retrieval system to the information needs of the user, regardless of how fully and how accurately this information need is expressed in the text of the information request itself. This is the ratio of the amount of useful information to the total amount of information received. Roughly speaking, this is search efficiency.
Specialists carrying out qualified searches for information on the Internet need to make certain efforts to filter search results, weeding out unnecessary information “noise”. And for this, professional-level search tools are used.
One of these professional systems is the Russian program FileForFiles & SiteSputnik (SiteSputnik).
Developer Alexey Mylnikov from Volgograd.

"The FileForFiles & SiteSputnik program (SiteSputnik) is designed to organize and automate professional search, collection and monitoring of information posted on the Internet. Special attention focuses on receiving new information on topics of interest. Several information analysis functions have been implemented."


Monitoring and categorization of information flows


First a few words about monitoring information flows, a special case of which is monitoring of media and social networks:

  • the user indicates the Sources that may contain the necessary information and the Rules for selecting this information;

  • the program downloads fresh links from Sources, frees their content from garbage and repetitions, and arranges them into Sections according to the Rules.

  • To see live a simple but real monitoring process, which involves 6 sources and 4 headings:
  • open the Demo version of the program;


  • then, in the window that appears, click on the button Together;

  • and when WebsiteSputnik will carry out this Project in real time, you:
    — in the “Clean Stream” list you will see all the new information from Sources,
    — in the “Post-request” section - only economic and financial news that satisfies the rule,
    - in the Headings "About the President", "About the Premiere" and "Central Bank", - information related to the relevant objects.

  • In real Projects, you can use almost any number of Sources and Rubrics.
    You can create your first working Projects in a few hours, and improve them during operation.
    The described information processing is available in the SiteSputnik Pro+News package and higher.

2. Simple and batch search, information collection

To get acquainted with the possibilities SiteSputnik Pro(basic version of the program) :

  • open the Demo version of the program;

  • enter your first request, for example, your full name, as I did:

    and click on the button Search.


  • The program (see the sign that SiteSputnik built) will poll in a few seconds 7 sources, will open in them 24 search pages, will find 227 relevant links, will remove duplicate links and from the remaining 156 unique list of links "An association".

    Name
    Source

    Ordered
    pages

    Downloaded
    pages

    Found
    links

    Time
    search

    Efficiency
    search

    Links
    New

    Efficiency
    New
    Yandex 5 5 50 0:00:05 32% 0 0
    Google 5 5 44 0:00:03 28% 0 0
    Yahoo 5 5 50 0:00:05 32% 0 0
    Rambler 5 4 56 0:00:07 36% 0 0
    MSN (Bing) 5 3 23 0:00:04 15% 0 0
    Yandex.Blogs 5 1 1 0:00:01 1% 0 0
    Google.Blogs 5 1 3 0:00:01 2% 0 0
    Total: 35 24 227 0:00:26 0 0
    Total: number of unique links - 156 , duplicate links - 46 %.

  • (! ) Repeat your request after a few hours or days, and you will see only new links that appeared in the Sources for this period of time. In the last two columns of the table you can see how many new links each Source brought and its efficiency in terms of “novelty”. When a query is executed multiple times, a list containing only new links , is created relative to all previous executions of this request. It would seem elementary and required function, but the author is not aware of any program in which it is implemented.

  • (!! ) The described capabilities are supported not only for individual requests, but also for entire request packets :

    The package you see consists of seven different queries that collect information about Vasily Shukshin from several Sources, including search engines, Wikipedia, exact search in Yandex news, metasearch and search for mentions on TV and radio stations. To the script TV and Radio includes: "Channel One", "TV Russia", NTV, RBC TV, "Echo of Moscow", radio company "Mayak", ... and other sources of information. Each Source has its own search or browsing depth in pages. It is listed in the third column.

    Batch search allows you to perform comprehensive searches with one click collection of information on a given topic.
    Separate list new links, upon repeated executions of the package, will contain only links that were not previously found.
    Remember what and when you asked the Internet and what it answered you No need- everything is automatically saved in libraries and in program databases.
    I repeat that the capabilities described in this paragraph are fully included in the package SiteSpunik Pro.


  • More details in the instructions: SiteSputnik Pro for beginners.

3. Objects and search monitoring

Quite often the User is faced with the following task. You need to find out what is on the Internet about a specific object: a person or a company. For example, when hiring a new employee or when a new counterparty appears, you always know the full name, company name, telephone numbers, INN, OGRN or OGRNIP, you can also take ICQ, Skype and some other data. Next, using an appeal to special function programs WebsiteSputnik "Collecting information about the object" (equipment SiteSputnik Pro+Objects):

You enter the data that you know, and with one click of the mouse you carry out accurate And full search for links containing specified information. The search is performed on several search engines at once, using all the details at once, using several possible combinations of recording details at once: remember how you can write down a phone number in different ways. After a certain period of time, without doing boring routine work, you will receive a list of links, cleared of repetitions and, most importantly, ordered by relevance to the object you are looking for. Relevance (significance) is achieved due to the fact that the first in the SiteSputnik search results will be those links on which large quantity the details you specified, and not those that moved up the search engine results of the Webmaster.

Important .
The SiteSputnik program is better than other programs at extracting real, but not official information about the Object. For example, in the official database mobile operator it may be recorded that the phone belongs to Vasily Terekhin, but in reality this phone contains information that Alexander sold a Ford Focus car in 2013, which is additional information for consideration.

Search monitoring .
Search monitoring means the following. If you need to track the occurrence new links, by a given object or arbitrary package of queries, then you just need to periodically repeat the corresponding search. Same as for a simple request, program SiteSputnik will create a "New" list, in which it will place only those links that were not found in any of the previous searches.

Search monitoring interesting not only in itself. It may be involved in monitoring the media, social networks and other news sources, which was mentioned above in paragraph 1. Unlike other programs, in which it is possible to obtain new information only from RSS feeds, in the program WebsiteSputnik can be used for this searches built into websites And search engines . Also possible emulation (self-creation) several RSS feeds from arbitrary pages, moreover, emulation of an RSS feed upon request and even a batch of requests.


  • To get the most out of the program, use its main functions, namely:

    • request packages, packages with parameters, use the Assembler (assembler), the "Analytical merging" operation of the results of several tasks, if necessary, apply basic search functions on the invisible Internet;

    • connect your sources to the information sources built into the program : other search engines and searches built into sites, existing RSS feeds created by you own RSS feeds With arbitrary pages, use the search function for new sources;

    • use the following types of features monitoring: Media, social networks and other sources, monitoring comments to news and messages, track the appearance of new information on existing pages;

    • engage Categories , External functions, Task Scheduler, mailing list, multiple computers, Project Instructor, install alarm To notify you of the occurrence of significant events, use the other functions listed below.



4. SiteSputnik program (SiteSputnik): options and features

- Program SiteSputnik is constantly improving in the following areas: "I need to find everything and with a guarantee".
"Interrogation Software for the Internet", - another definition of the User for assigning the program.

A. Functions for searching and collecting information.

. Request package - execution of several queries at once, combining search results or separately. When generating the combined result, repeatedly found links are removed. More details about packages can be found in the introduction to SiteSputnik, and visually in the video: a joint And separate execution of requests. There are no analogues in domestic and foreign developments.

. Packages with parameters. Any queries and query packages designed to solve standard search tasks, for example, search by phone number, full name or e-mail, - can be parameterized, saved and executed from a library of ready-made queries with the substitution of actual (necessary) parameter values. Each package with parameters is its own special advanced search form . It can use not one, but several search engines. You can create forms that are very complex in their functional purpose. It is extremely important that forms can be created by users themselves, without the participation of the program author or programmer. This is written very simply in the instructions, more details in a separate publication about search parameterization and on the forum, clearly in the video: search for all options for recording a number at once mobile phone and according to several options for recording the address Email. There are no analogues.

. Assembler NEW- assembling a search task from several ready-made ones : requests, request packages and parameter packages. Packages may contain other packages in their text. The depth of nesting of packages is unlimited. You can create several search tasks, for example, about several legal and individuals, and complete these tasks simultaneously. More details on the forum and in a separate publication about Assembler, clearly at video. There are no analogues.

. Metasearch - execution of a specific request simultaneously at a given “depth” of search for each of them. Metasearch is possible using built-in search engines, which include Yandex, Rambler, Google, Yahoo, MSN (Bing), Mail, Yandex and Google blogs, and connected search tools. Working with multiple search engines looks like you're working with one search engine . Re-found links are deleted. Visually metasearch on three connected social networks: VKontakte, Twitter and Youtube - shown on video.

. Metasearch on the site - combining site search in Google, Yahoo, Yandex, MSN (Bing). Clearly on video.

. Metasearch in office documents - combining search in files PDF format, XLS, DOC, RTF, PPT, FLASH in Google, Yahoo, Yandex, MSN (Bing). You can choose any combination of file formats.

. Metasearch for cache copies links in Yandex, Google, Yahoo, MSN (Bing). A list is compiled, each item of which contains all the snippets found for each link by each search engine. There are no analogues.

. Deep Search for Yandex, Google and Rambler allows you to combine into one list all links from the regular search and all links, respectively, from the lists “More from the site”, “Additional results from the site” and “Search on the site (Total...)”. Read more about deep search on the forum. There are no analogues.

. Accurate and complete search . This means the following. On the one hand, each query can be executed on that and only on the source in whose query language it is written. This exact search. On the other hand, there can be an arbitrary number of such requests and sources. This provides full search. Read more in a separate post about procedural search. There are no analogues.

. Searching the Invisible Internet .

    It includes the following basic features:

    A special package of requests that can be improved by the User,
    - search for invisible links using a spider,
    - search for invisible links in the vicinity of a visible link or folder by “image and likeness”,
    - special searches for open folders,
    - search for invisible links and folders with standard names using special dictionaries,
    - use of your own searches built into sites.

    More details in a separate publication on SiteSputnik Invisible. The basic functions are “well known in narrow circles,” but the way they are used has no analogues. The essence of this method is to build a site map visible from the Internet (in other words, materialize the visible Internet), and only on the basis of visible links and search for invisible links in relation to them. Searching for already visible links using “invisible” methods is not carried out.

B. Information monitoring functions.

. Monitoring for appearance on the Internet new links on a given topic. Monitor appearance new links can be used using integers request packets , which involve any of the search methods mentioned above, rather than individual search engine front pages. Implemented union and intersection new links from multiple separate searches. More details in the publication on monitoring (see § 1) and on the forum. There are no analogues.

. Collective information processing . Creation corporate or professional network for collective collection, monitoring and analysis of information. The participants and creators of such a network are employees of the corporation, members of a professional community or interest groups. The geographic location of the participants does not matter. More details in a separate publication about organizing a network for collective collection, monitoring and analysis of information.

. Monitoring links (web pages) to detect changes in their content (content). Beta version. Found changes are highlighted with color and special symbols. More details in a separate publication on monitoring (see § 2 and 3).

IN. Information analysis functions.

. Categories of materials already described above. More details can be found in a separate publication about Rubrics. Rules for entering Rubrics allow you to specify keywords and the distance between them, set logical “AND”, “OR” and “NOT”, apply a multi-level bracket structure and dictionaries (insert files) to which you can apply logical operations.

. VF technology - almost arbitrary expansion of the possibility of categorizing materials through the implementation of external functions that are organically integrated into the Rules for entering Rubrics and can be implemented by the programmer independently without the participation of the program author.

. Numerical analysis occupancy of Rubriks, installation alarm and notification of the occurrence of significant events by highlighting the Rubrics in color and/or sending an alarm report by e-mail.

. Factual relevance. There is an option to arrange the links in order close to significance these links in relation to the problem being solved, bypassing the tricks of webmasters who use various methods to increase the ranking of sites in search engines. This is achieved by analyzing the results of executing several “diverse” queries on a given topic. In the literal sense of the word, links containing maximum required information . Read more in the description of how to find the optimal supplier and on the forum. There are no analogues.

. Calculating Object Relationships - search for links, resources (sites), folders and domains on which objects are simultaneously mentioned. The most common objects are people and firms. To search for connections, all program tools mentioned on this page can be used SiteSputnik, which significantly increases the efficiency of the work you do. The operation is performed on any number of objects. More details in the introduction to the program, as well as in the description new feature"objects and their connections." There are no analogues.

. Formation, integration and intersection of information flows on a variety of topics, comparison of threads. More details in a separate post on threads.

. Building web maps sites, resources, folders and searched objects based on those found on the Internet when Google help, Yahoo, Yandex, MSN (Bing) and Altavista links belonging to the site. Experts can find out: is it visible "extra" information from the Internet on their websites, as well as research competitors’ websites on this subject. Web sitemap is materialization of the visible internet . More details in a separate publication about building web maps, visually at video. There are no analogues.

. Finding new sources of information on a given topic, which can then be used to track the emergence of new relevant information. More details at.

G. Service functions.

. Task Scheduler provides work Scheduled: performs specified program functions at a given time. More details in a separate publication about the Planner.

. Project Instructor NEW- this is an assistant creation and maintenance Projects for searching, collecting, monitoring and analyzing information (categorization and signaling). More details on the forum.

. Automatic archiving. IN databases All the results of your work are automatically remembered, namely: requests, request packages, search and monitoring protocols, any other of the above functions and the results of their execution. Can structure work on topics and subtopics.

. Database includes sorting, simple search and custom search by SQL query. For the latter, there is a wizard for composing SQL queries. Using these tools, you can find and review the work you did yesterday, last month, a year ago, define a topic as a search criterion, or set another search criterion based on the contents of the database.

. Technical limitations search engines. Some limitations, such as the length of the query string, can be overcome. It ensures the execution of not one, but several queries, combining search results or separately. You can read about a way to overcome violation of the law of additivity for major search engines. For one word or one phrase enclosed in quotes, a case-sensitive search in search engines is implemented, in particular, search by abbreviation.

Built-in browser . Navigator by page. Multicolor marker to highlight key and arbitrary words. Bilisting and N-listing from generated documents.

. Unloading news feeds into a table view focused on import in Excel, MySQL, Access, Kronos and other Applications.


5. Installation and launch of the Program, computer requirements.

To install and run the program:

  • Download the file, copy the FileForFiles folder from it to your HDD, for example, on D:\;

  • Demo version of the program will be installed and it will open.

  • The program will work on any computer on which it is installed Windows any versions.