
MOSS Search!!!
Hi All,
This is my first Blog. This is on MOSS Search
I hope you guys will enjoy!!!
Ok ! Now first let us understand what is search. We perform search in engines like search.microsoft.com, google.com [favourite], yahoo.com etc, to find details on what we need. In MOSS, we search the corporate data and if necessary others too. Search in MOSS can be extended to the maximum. Also the algorithm used provides the exact result needed.
Note: This is just for a basic understanding of it.
Ok now to dive in to the Architecture;
Searching and Indexing are most important parts in MOSS and they are interconnected.
Due to this a SharePoint server can have the search and index role together or it can be in two different servers.
Some technical terms used in Search:
Index, Content Source, Crawl, Propogation, Property Store. Also there are many more which we will come to understand in the later phase of this BLog.
Index:
Index is just like the index of a book. This is a location on the file system which contains the exact reference to the word.
Content Source:
A content source is a collection of start addresses representing content that should be crawled by the search index component. A content source also specifies settings that define the crawl behavior and the schedule on which the content will be crawled.
Enterprise Search provides several types of content sources by default, so it is easy to configure crawls to different types of data, both internal and external.
Enterprise Search provides several types of content sources by default, so it is easy to configure crawls to different types of data, both internal and external.
Architecture and How Stuff work !!!
Search Role:
Phase 1:
+ Gatherer: Gathers the search query and sends it to the word breaker for processing
+ Word breakers: To break compound words and phrases into individual words or tokens.
Query Engine uses it
+ Stemmers: ing, Ed etc
+ Noise files are removed [is, a , and etc]
Sends the word to the next phase
Phase 1:
+ Gatherer: Gathers the search query and sends it to the word breaker for processing
+ Word breakers: To break compound words and phrases into individual words or tokens.
Query Engine uses it
+ Stemmers: ing, Ed etc
+ Noise files are removed [is, a , and etc]
Sends the word to the next phase
Indexer Role:
Phase1:
THE FILTER DEAMON:
******* Components: I Filter+ Protocol Handler *******
+ Filter Daemon: It actually calls the exact protocol by making use of the protocol handler to open the documents and contents in the content source
+ Protocol Handler: opens the documents and contents in the content source in their native format and exposes it to the I filter
+ I Filter: Filters into the chunk of text and properties from the content source opened by the protocol handler and gives it to the index engine.
Phase1:
THE FILTER DEAMON:
******* Components: I Filter+ Protocol Handler *******
+ Filter Daemon: It actually calls the exact protocol by making use of the protocol handler to open the documents and contents in the content source
+ Protocol Handler: opens the documents and contents in the content source in their native format and exposes it to the I filter
+ I Filter: Filters into the chunk of text and properties from the content source opened by the protocol handler and gives it to the index engine.
Phase2:
+ Index Engine: Processes the text and properties of the content
source and puts it in the Content Index and property store
+ Content Index - Has the words location in the content source
+ Property Store - DB
This table in SQL DB Stores the properties filtered
=========================================================
Properties Associated Values Security Map to the word in the content index
=========================================================
So references are created at two locations,
• At the File system Level – Content Index
• At the DB Level – Property Store
The text and the property are mapped...
• At the File system Level – Content Index
• At the DB Level – Property Store
The text and the property are mapped...
To do:
This crawled property has to be mapped with the managed property for it to be included in the search.
This crawled property has to be mapped with the managed property for it to be included in the search.
Logic:
Crawled property is basically mapped according to the content sources;
Share Point content
Web content
File share content
Exchange folder content
Business data content
The managed property is with respect to the search made by the user.
Hence both have to be mapped.
Crawled property is basically mapped according to the content sources;
Share Point content
Web content
File share content
Exchange folder content
Business data content
The managed property is with respect to the search made by the user.
Hence both have to be mapped.
Please refer to the flow diagram to get a bird view of the architecture.
Comments are Welcome!!!
No comments:
Post a Comment