Blog Spider презентация

Introducing BlogSpider is a website project that allows user to crawl pages find on them RSS channels and store. The main goal of the project was to learn new technologies and

Слайд 1Blog Spider
Serhii Lukashov


Слайд 2Introducing
BlogSpider is a website project that allows user to crawl pages

find on them RSS channels and store.
The main goal of the project was to learn new technologies and dive into the AKKA.net.

Слайд 3Project structure
Project consists of four main parts:
Lighthouse
Crawler
Tracker
Web application


Слайд 4Base crawling alghoritm
Web crawler is an essential component of search engines,

data mining and other Internet applications. It recursively downloads Web pages into local storage, as shown at picture.
The operations can be briefly described as the following four steps:
a. Take a set of seed URLs as initial task URLs.
b. Select a URL from task URLs and download the page from the Web.
c. Extract hyperlinks contained in the downloaded page and if desirable add the new URLs into task URLs in strategic order.
d. Repeat step b and c until either task URLs become empty or the crawl is stopped by the application.
The strategy to determine the order of task URLs to crawl is crawl scheduling. Given a time window T, different scheduling strategies can lead to very different sets of Web pages crawled in T. Scheduling Web pages URL requests Task URLs Seed URLs Initiate Get next URL Get page Extract URLs database Web Figure

Слайд 5Base concept of crawler cluster
Here you can see basic roles wich

must be in
crawler cluster.
Web - web application wich run
some job to crawl.
Tracker - this service which tell what we nee to crawl


Слайд 6What is lighthouse?
Lighthouse is a dedicated seed nodes tool for our

cluster. It only has to be operate one occur cluster itself is upgraded and it’s not actually deployed as part of your application, so it should never have to be redeployed when you make no changes but it will need to be upgraded as occur that cluster gets upgraded.

Слайд 7Let`s look how it work


Слайд 8Project conclusions
At this project i`ve got a lot of knoledges about

implementing cluster by using Akka.net also learn such frameworks as Topshelf, Quartz.net, SinglaR

Обратная связь

Если не удалось найти и скачать презентацию, Вы можете заказать его на нашем сайте. Мы постараемся найти нужный Вам материал и отправим по электронной почте. Не стесняйтесь обращаться к нам, если у вас возникли вопросы или пожелания:

Email: Нажмите что бы посмотреть 

Что такое ThePresentation.ru?

Это сайт презентаций, докладов, проектов, шаблонов в формате PowerPoint. Мы помогаем школьникам, студентам, учителям, преподавателям хранить и обмениваться учебными материалами с другими пользователями.


Для правообладателей

Яндекс.Метрика