検索エンジンの仕組み

このガイドでは、検索エンジンがどのように機能するかを紹介します。

検索エンジンは、独自のウェブクローラーを使って何千億ものページをクロールすることで機能しています。これらの Web クローラーは、一般に検索エンジンボットまたはスパイダーと呼ばれます。

検索エンジンのインデックス

検索エンジンによって発見された Web ページは、インデックスと呼ばれるデータ構造に追加されます。

インデックスには、発見されたすべての URL が、次のような各 URL のコンテンツに関する多くの関連する重要なシグナルとともに含まれます。

– ページとドメインをどのように関連付けるのか？

検索エンジンアルゴリズムの目的は、ユーザーのクエリや質問をできるだけ早く満たす、関連性のある一連の高品質な検索結果を提示することです。

ユーザーによって検索クエリが検索エンジンに入力されると、関連性があるとみなされるすべてのページがインデックスから識別され、アルゴリズムが関連ページを一連の結果に階層的にランク付けするのに使用されます。

検索クエリに加えて、検索エンジンは結果を返すために、次のような他の関連データを使用します：

位置 – 検索クエリの中には、たとえば、場所に依存するものがあります。
場所 – 検索クエリの中には、「近くのカフェ」や「映画の時間」など、場所に依存するものがあります。
検出された言語 – 検索エンジンは、検出できればユーザーの言語で結果を返します。
Device – A different set of results may be returned based on the device from which the query was made.

There are a number of circumstances where a URL will not be indexed by a search engine. This may be due to:

Robots.txt file exclusions – a file which tells search engines what they shouldn’t visit on your site.
Directives on the webpage telling search engines not to index that page (noindex tag) or to index another similar page (canonical tag).
Search engine algorithms judging the page to be of low quality, have thin content or contain duplicate content.
The URL returning an error page (e.g. a 404 Not Found HTTP response code).

Next: Search Engine Crawling

Author

Sam Marsden is DeepCrawl’s SEO & Content Manager. また、Search Engine Journal や State of Digital などの業界誌にも寄稿しています。