Monday 29 December 2014

How to scrape address from Google Maps

If you want to build a new online directory based website and want it to be popular with latest web contents, then you need the help of web scraping services from iWeb scraping. If you want to scrape address from maps.google.com, there is a specialized web scraping tool developed by iWeb scraping which can do the job for you. There are plenty of benefits with web scraping which includes market research, gathering customer information, managing product catalogs, compare prices, gather real estate data, gather job posting information etc. Web scraping technology is very popular nowadays and it saves lot of time and effort involved in manual extraction of data from websites.

The web scraping tools developed iWeb Scraping is very user-friendly and can extract specific information from targeted websites. It converts data from HTML web pages to useful formats like Excel spread sheets or Access database. Whatever web scraping requirements you have, you can contact iWeb Scraping as they have more than 3.5 years of web data extraction experience and offer the best prices in the industry. Also their services are available in 24x7 basis and free pilot projects will be done based on request.

Companies which require specific web data and look for an application which can automate the process and export the HTML data in structured format could benefit greatly from web scraping applications of iWeb scraping. You can easily extract data from multiple target websites, parse and re-assemble the information in HTML format to database or spread sheets as you wish. The application has simple point-and-click user-interface and any beginner can use it scrape address from Google Maps. If you want to gather address of people in particular region from Google maps, you can do it with help of web scraping application developed by iWebscraping.

Web Scraping is a technology that able to digest target website databases that are visible only as HTML web pages, and create a local, identical replica of those databases as a information or result. With our web scraping & web data extraction service we can capture web pages, then pin-point specific pieces of data/information you'd like to extract from web pages. What is needed in this process is much more than a Website crawler and set of Website wrappers. The time required to do web data extraction goes down in comparison to manually data copying and pasting job.

Source:http://www.articlesbase.com/information-technology-articles/how-to-scrape-address-from-google-maps-4683906.html

Wednesday 24 December 2014

Limitations and Challenges in Effective Web Data Mining

Web data mining and data collection is critical process for many business and market research firms today. Conventional Web data mining techniques involve search engines like Google, Yahoo, AOL, etc and keyword, directory and topic-based searches. Since the Web's existing structure cannot provide high-quality, definite and intelligent information, systematic web data mining may help you get desired business intelligence and relevant data.

Factors that affect the effectiveness of keyword-based searches include:

• Use of general or broad keywords on search engines result in millions of web pages, many of which are totally irrelevant.

• Similar or multi-variant keyword semantics my return ambiguous results. For an instant word panther could be an animal, sports accessory or movie name.

• It is quite possible that you may miss many highly relevant web pages that do not directly include the searched keyword.

The most important factor that prohibits deep web access is the effectiveness of search engine crawlers. Modern search engine crawlers or bot can not access the entire web due to bandwidth limitations. There are thousands of internet databases that can offer high-quality, editor scanned and well-maintained information, but are not accessed by the crawlers.

Almost all search engines have limited options for keyword query combination. For example Google and Yahoo provide option like phrase match or exact match to limit search results. It demands for more efforts and time to get most relevant information. Since human behavior and choices change over time, a web page needs to be updated more frequently to reflect these trends. Also, there is limited space for multi-dimensional web data mining since existing information search rely heavily on keyword-based indices, not the real data.

Above mentioned limitations and challenges have resulted in a quest for efficiently and effectively discover and use Web resources. Send us any of your queries regarding Web Data mining processes to explore the topic in more detail.

Source: http://ezinearticles.com/?Limitations-and-Challenges-in-Effective-Web-Data-Mining&id=5012994

Thursday 18 December 2014

Data Extraction - A Guideline to Use Scrapping Tools Effectively

So many people around the world do not have much knowledge about these scrapping tools. In their views, mining means extracting resources from the earth. In these internet technology days, the new mined resource is data. There are so many data mining software tools are available in the internet to extract specific data from the web. Every company in the world has been dealing with tons of data, managing and converting this data into a useful form is a real hectic work for them. If this right information is not available at the right time a company will lose valuable time to making strategic decisions on this accurate information.

This type of situation will break opportunities in the present competitive market. However, in these situations, the data extraction and data mining tools will help you to take the strategic decisions in right time to reach your goals in this competitive business. There are so many advantages with these tools that you can store customer information in a sequential manner, you can know the operations of your competitors, and also you can figure out your company performance. And it is a critical job to every company to have this information at fingertips when they need this information.

To survive in this competitive business world, this data extraction and data mining are critical in operations of the company. There is a powerful tool called Website scraper used in online digital mining. With this toll, you can filter the data in internet and retrieves the information for specific needs. This scrapping tool is used in various fields and types are numerous. Research, surveillance, and the harvesting of direct marketing leads is just a few ways the website scraper assists professionals in the workplace.

Screen scrapping tool is another tool which useful to extract the data from the web. This is much helpful when you work on the internet to mine data to your local hard disks. It provides a graphical interface allowing you to designate Universal Resource Locator, data elements to be extracted, and scripting logic to traverse pages and work with mined data. You can use this tool as periodical intervals. By using this tool, you can download the database in internet to you spread sheets. The important one in scrapping tools is Data mining software, it will extract the large amount of information from the web, and it will compare that date into a useful format. This tool is used in various sectors of business, especially, for those who are creating leads, budget establishing seeing the competitors charges and analysis the trends in online. With this tool, the information is gathered and immediately uses for your business needs.

Another best scrapping tool is e mailing scrapping tool, this tool crawls the public email addresses from various web sites. You can easily from a large mailing list with this tool. You can use these mailing lists to promote your product through online and proposals sending an offer for related business and many more to do. With this toll, you can find the targeted customers towards your product or potential business parents. This will allows you to expand your business in the online market.

There are so many well established and esteemed organizations are providing these features free of cost as the trial offer to customers. If you want permanent services, you need to pay nominal fees. You can download these services from their valuable web sites also.

Source: http://ezinearticles.com/?Data-Extraction---A-Guideline-to-Use-Scrapping-Tools-Effectively&id=3600918

Monday 15 December 2014

Do blog scraping sites violate the blog owner's copyright?

I noticed that my blog has been posted on one of these website scraping sites. This is the kind of site that has no original content, but just repeats or scrapes content others have written and does it to get some small amount of ad income from ads on the scraping site. In essence the scraping site is taking advantage of the content of the originating site in order to make a few dollars from people who go to the site looking for something else. Some of these websites prey on misspelling. If you accidentally misspell the name of an original site, you just may end up with one of these patently commercial scraping sites.

Google defines scraping as follows:

•    Sites that copy and republish content from other sites without adding any original content or value
•    Sites that copy content from other sites, modify it slightly (for example, by substituting synonyms or using automated techniques), and republish it
•    Sites that reproduce content feeds from other sites without providing some type of unique organization or benefit to the user

My question, as set out in the title to this post, is whether or not scraping is a violation of copyright. It turns out that the answer is likely very complicated.  You have to look at the definition of a scraping site very carefully. Let me give you some hypotheticals to show what I mean.

Let's suppose that I write a blog and put a link in my blog post to your blog. Does that link violate your copyright? I can't imagine that anyone would think that there was problem with linking to another website on the Web. In this case, there is no content from the originating site, just a link.

But let's carry the hypothetical a little further. What if I put a link to your site and quote some of your content? Does this violate copyright law? If you are acquainted with any of the terminology of copyright law; think fair use. The issue here is whether or not the "quoted" material is a substantial reproduction of the entire original content? I would have the opinion that duplicating an entire blog post either with or without attribution would be a violation of the originator's copyright.

So is the scraping website protected by the "fair use" doctrine? Does the fact that the motivation for listing the original websites is to make money have anything to do with how you would decide if there was or was not a violation of the originator's copyright? By the way, the copyright does not make a distinction between a commercial and non-commercial use of the original constituting or not constituting a violation of copyright. The fact that the reproducing (scraping) party does not make money from the reproduction is not a factor in the issue of violation, although it may ultimately be an issue as to the amount of damages assessed.

Does the fact that the actions of the scraper annoy me, make any difference? I would answer, not in the least. Whether or not you are annoyed by the violation of the copyright makes no difference as to whether or not there is a violation. Likewise, you have no independent claims for your wounded feelings because of the copied content. Copyright is a statutory action (i.e. based on statutory law) and unless the cause of action is recognized by the law, there is no cause of action. Now, in an outrageous case, you may have  some kind of tort (personal injury) claim, but that is way outside of my hypothetical situation.

So what is the answer? Does scraping violate the originator's copyright? If only a small portion of the blog is copied (scraped) then I would have to have the opinion that it is not. Essentially, no matter what the motivation of the scrapper, there is not enough content copied to violate the fair use doctrine. Now, that is my opinion. Your's might differ. That is what makes lawsuits.

Do I think there are other reasons why scraping websites are objectionable? Certainly, but those reasons have nothing to do with copyright and they are probably the subject of another different blog post. So, if you are reading this from scraping website, bear in mind that there may be a serious problem with that type of website.

Source:http://genealogysstar.blogspot.in/2013/05/do-blog-scraping-sites-violate-blog.html

Thursday 11 December 2014

Seven tools for web scraping – To use for data journalism & creating insightful content

I’ve been creating a lot of (data driven) creative content lately and one of the things I like to do is gathering as much data as I can from public sources. I even have some cases it is costing to much time to create and run database queries and my personal build PHP scraper is faster so I just wanted to share some tools that could be helpful. Just a short disclaimer: use these tools on your own risk! Scraping websites could generate high numbers of pageviews and with that, using bandwidth from the website you are scraping.

1. Scraper (Chrome plugin)

    Scraper is a simple data mining extension for Google Chrome™ that is useful for online research when you need to quickly analyze data in spreadsheet form.

You can select a specific data point, a price, a rating etc and then use your browser menu: click Scrape Similar and you will get multiple options to export or copy your data to Excel or Google Docs. This plugin is really basic but does the job it is build for: fast and easy screen scraping.

2. Simple PHP Scraper

PHP has a DOMXpath function. I’m not going to explain how this function works, but with the script below you can easily scrape a list of URLs. Since it is PHP, use a cronjob to hourly, daily or weekly scrape the desired data. If you are not used to creating Xpath references, use the Scraper for Chrome plugin by selecting the data point and see the Xpath reference directly.

scraper-example

– Click here to download the example script.

3. Kimono Labs

Kimono has two easy ways to scrape specific URLs: just paste the URL into their website or use their bookmark. Once you have pointed out the data you need, you can set how often and when you want the data to be collected. The data is saved in their database. I like the facts that their learning curve is not that steep and it doesn’t look like you need a PHD in engineering to use their software. The disadvantage of this tool is the fact you can’t upload multiple URLs at once.

4. Import.io

Import.io is a browser based web scraping tool. By following their easy step-by-step plan you select the data you want to scrape and the tool does the rest. It is a more sophisticated tool compared to Kimono. I like it because of the fact it shows a clear overview of all the scrapers you have active and you can scrape multiple URLs at once.

5. Outwit Hub

I will start with the two biggest differences compared to the previous tool: it is a softwarepackage to use on your PC or laptop and to use its full potential it will cost you 75 USD. The free version can only scrape 100 rows of data. What I do like is the number of preprogrammed options to scrape which makes it easy to start and learn about web scraping.

6. ScraperWiki

This tool is really for people wanting to scrape on a massive scale. You can code your own scrapers (in PHP, Ruby & Python) and pricing is really cheap looking to what you can get: 29USD / month for 100 datasets. You are completely free in using libraries and timers. And if your programming skills are not good enough, they can help you out (paid service though). Compared to other tools, this is the most advanced tool that offers the basics of web scraping.

7. Fminer.com

This tool made it possible to finally scrape all the data inside Google Webmaster Tools since it can deal with JavaScript and AJAX interfaces. Read my extensive review on this page: Scraping Webmaster Tools with FMiner!

But on the end, building your individual project scrapers will always be more effective than using predefined scrapers. Am I missing any tools in this sum up of tools?

Source: http://www.notprovided.eu/7-tools-web-scraping-use-data-journalism-creating-insightful-content/