Friday, 31 May 2013

LinkedIn Data Extraction

Looking for latest information about various business profiles.

LinkedIn is one website that contains very useful information about the business personnels. Information about the contact name, his location address, street address, email ID, network connections etc can be easily extracted by using the services of www.iwebscraping.com. Many business users who are in need of this data go for scrape data from LinkedIn website. This data contains the structured and well organized information which users can integrate in their business activities to produce unique business solutions. The output data is usually in the screen output format and is very user-friendly. The data is generally stored in MS Access, CSV, MS Excel or MySQL format for temporary or permanent use.

Integrate data into your business activities to produce unique business solutions.

Another data extraction technique that is widely used by many business users is the data scraping technique. Web scraping from facebook website allows the transformation of unstructured data into its equivalent structured form that can be analyzed and processed as per the business needs. One of the major web scraping service provider is the www.iwebscraping.com. People looking to scrape data from facebook website can visit the online store of this company. The scraping process usually involves extracting and collecting data from facebook pages and profiles and saving that data in one screen output form that is very easy for users to read and understand.
Integrate data into your business activities to produce unique business solutions.
Process is fast and allows the users to extract bulk data.

Data extract from LinkedIn website is the term given to extract image and picture data from LinkedIn website. Various data extraction service providers are available which provide useful services to their millions of customers. One such service provider is the www.iwebscraping.com. The site features tools which users can download for data extract from LinkedIn website. The automatic data extraction process is fast and allows the users to extract bulk data. Users can focus on their other high priority work while the data is being extracted from LinkedIn website. Clearly automatic data extraction has made the life of analysts far more simpler and easier.

Source: http://www.iwebscraping.com/linkedin-website-scraper.php

Wednesday, 29 May 2013

Preventing spamBots From Harvesting or Scraping Email Addresses From Web Pages

Spam has become a worldwide epidemic, and prevention is the current focus. A day will come when a cure is sought more aggressively than a bandage, but for now companies are making tons of money selling us filters and spam prevention kits. Our approach at the PHP Kemist is to byte back with encrypted or obfuscated email addresses that spamBots cannot scrape or harvest from your web pages.

spamBots are robots that spider sites looking for email addresses by searching for patterns that match an email address. They perform the function of crawlers when they find their targets as they collect the email addresses from pages that contain pattern matches. spamBots started out as simple programs that were fed lists of web addresses and methodically worked through all available links seeking email addresses to collect. The general public and most webmasters were unaware of this process and were loading web pages with email addresses to provide customers with more methods of easy contact with store owners and business representatives. Unfortunately, this lack of awareness bred the modern age of the intelligent spamBot.

As spamBots became common knowledge and the Online community grew angry about the spam they were receiving, companies moved in to provide solutions for a price. Their anti-spam solutions work anywhere from poorly to really well. Our own anti-spam through Go Daddy reduced spam to less than 1%, which has been extremely easy to manage. However, with anti-spam measures comes the responsibility to check filtered messages for incorrectly tagged email, which must be tagged as “good,” else you may start to lose email. The training process for anti-spam is fairly easy, but requires methodical and regular diligence.

While anti-spam looked great to many companies and email users, webmasters continued to provide an easily accessible list of company emails through web page publications. spamBot programmers became more savvy and spamBots grew more powerful. Third world countries got into the game using Internet bars and quickly found a new source of revenue. Not getting into the world of scams and Ponze Schemes, email rapidly matured from simple communications into a seething pit of crap from which we had to carefully pluck our good email messages. The problem has contiues to grow while software companies remain reluctant to fix the problem, since they are cashing in on temporary solutions.

So, what is the cure to the problem, one might ask? There are many aspects of the solution that require effort from different members of the software companies and Online communities. From the perspective of the webmaster,our part of the cure is to stop providing spamBots with the food that keeps them alive. Stop placing email addresses in easily accessible locations with simple formats. There are a few simple strategies webmasters can use to prevent spam and spamBots. How these strategies are applied can vary greatly from webmaster to webmaster, but they typically fall from the webmaster’s hands to the programmer’s hands.

1. Email is a strnig of characters that creates a recognizable pattern. If we break the expected pattern, spamBots may overlook the email address and move on. A common and simple method of pattern breaking is Unicode character replacement. Browsers interpret Unicode efficiently and convert the Unicode segments back into alphanumeric characters. Take the email address bob@hates-spam.com and perform Unicode replacement and that email address becomes:

bob@hates-spam.com

The letter b was replaced with the Unicode string of  b that is not picked up by at least 97% of spamBots. Simply replacing the @ symbol with the @ string obfuscates the email address from most pattern matching spamBots. Add the replacement of the period with . and you have a string of characters that spamBots are likely to ignore.

2. Dynamic character replacement is a more advanced method of obfuscation. Using the Javascript programming language, your web page can dynamically generate email address links and trigger the mailto command from the browser. This method is slightly more advanced than Unicode character substitution, but is still pretty easy to integrate into any website.

The first component of dynamic character replacement is creating an array of the characters in the email address in the page. Replacement characters can be of any convention, but we’ll use numeric replacements for this example. Lets consider the alphabet and the letter A beng the first letter. We use the number 1 to replace the letter A, 2 for B; 3 for C, etc. We then create a segment of Javascript code in the email address hyperlink that triggers the Javascript character replacement process, and use our array of character substitutes to reconstitute our intended email address.

Javascript methods may be deployed globally using a complete alphanumeric-symbolic substitution array, or a reduced set on a per-page basis. This choice is one mde by the programmer based on web server performance, extensibility for the number of email addresses to be used, etc. Regardless the scope or deployment, the Javascript character replacement method dynamically replaces the obfuscated mailto hyperlink segment with the correct alphanumeric-symbolic characters, then triggers the mailto command, resulting in a normal and expected email address launch.

3. Less sophisticated methods of Javascript character handling can be used, such as reconstituting an email address in chunks. The Javascript code may have the pieces of the email address broken into multiple objects such as “bob” in the first, the @ symbol in the code, “hates” and “-” and “spam” in another set of objects, and the “.com” in another. When the user clicks the mailto link, javascript assembles the pieces on-the-fly and triggers the mailto event.

4. One of the more efficient methods of blocking email address recognition is using Flash media. Search engines and spamBots alike are not capable yet of interpreting the content of a Flash movie. Flash movies can have rather small dimensions and fit into your web site design efficiently. The file size is extremely small and yet be extremely functional. In most cases, an expected text block would not look clean  or acceptable with small Flash movies inline. But, in sections of the layout where the email address can reside on its own, Flash is a great solution to protectyour email address.

5. Don’t publicize your company email addresses at all! There are two ways to protect email addresses. The first is to provide a temporary contact email address in your website (hopefully obfuscated) that can be changed periodically should spam start showing up. Email addresses are super cheap and easy to manage, so use them more often. The email address you publish is not likely your personal email address, and is likely only going to be used by a web user while looking at your website. Once you establish communications with that customer, you are likely to provide direct contact information, including a more personal email address.

6. The next method of not using your email address is to provide an Online form, which submits the content to you behind the scenes, but via email. The form can provide an easier method of contact and communication for the customer, if you us dropMenus and preselected list of options specific to your products and services. I addition, there are advantages to this method of using contact forms, as your web server can differentiate between the selected subjects and send email to different team members. Let’s say your Subject dropMenu had three options: Request A Brochure, Ask A Question, Voice A Problem. These three Subject options would surely be delivered to different team members, which alows them to be more efficient at responding to the sender.

Caveats include formBot abuse to send spam. formBots are special spamBots that cruise the Internet looking for contact forms to submit spam. This is another area where webmasters have created the problem by not integrating security programming into their forms, and they assume everyone will play nicely.  Get your programmer involved in ANY aspect of communication with the user, especially a programmer that understands web security well. Web Security Experts are a speial breed of programmers, separate from regular programmers, and a world apart from webmasters.

formBots look for contact forms and submit their spam to you, sometimes in bucket loads. If the formBot is successful sending one form submission, why not send a few thousand? This converts the formBot function from simple spam to attempted DoS (Denial of Service) as you’ll be so inundated with contact form submissions that you won’t easily find your real form submissions to respond to. By simply adding some healthy PHP Programming to cleanse and validate form submissions, you can prevent the formBots from successfully submitting anything at all, you can track the IP Address used to attack your system, blacklist the IP Address, report abuse to their network provider, etc. Don’t just filter communications on one field, filter tham all and reject any submission that meets your criteria for spam or abuse.

7. Blogs, chatrooms, and other Online social websites allow signups. To signup you must provide an email address. This is a valid and healthy method for those websites to prevent formBots from signing up bucket loads of fake users, all presenting spam information in their profiles, and similar. You are supposed to receive an email asking you verify your address and accept the membership, else the membership never activates. All of this is healthy and expected.

So, you have yourself a social website membership and you want to share with others. That’s really nice of you and we appreciate your information, assuming it’s correct. But, did you stop to consider whether that website posts your email address as part of your identity? Have you checked to see if that website sells their email lists to other companies as part of their revenue stream? Most reputable social websites do not distribute or post your email address, but there is a simple method for being sure.

Since almost all web hosts that sell email service provide far more email addresses than you need, create some fake ones for this purpose. Go Daddy basic web hosting accounts costing as little as $2.80 per month supply as many as 100-500 email addresses for one domain name. Let’s say you’re signing up for Charlie Chick-Chocks Rotiserie Grill website because he has a free giveaway or a newsletter with coupons. His website is www.charliesgrill.com and you need to provide an email address that is real, and from which you can validate your account. Create an email address in your email service called charliesgrill@yourdomain.com, where yourdomain.com is whatever your domain name really is. Now, you can signup, verify your email address, and you can get your cool coupons for greasy ribs or whatever turns your gut. If Charlie sells the list of email addresses to Pamela’s Pink  Panties store, and she starts sending you pantyhose emails, you’ll quickly recognize who sold your email address by the address she sends to.

If you create MANY fake email addresses, which we do for this exact purpose, you might forward al of them into a bulk email account where you can easily read them from one location. Odds favor you won’t be using those address for regular correspondence, so a bulk email account is a best practice.

We’ve created hundreds of fake email addresses to prevent our real addresses from being distributed, as well as to test sites for the redistribution issue. Not a single email address has crcled back via the wrong domain name (sender), which makes us feel either safewith these sites, or thatthe method worked. The sending company could easily filter out email adresses that contain references to their domain or company name, but heck, that’s really what we wanted right?

Whatever the creative solution you choose to concoct,  just make sure you avoid making life easy for spamBots and formBots! We need to work together to kick spamBots in the pants and take money away from spamBot programmers.



Source: http://blog.phpkemist.com/2008/02/23/preventing-spambots-from-harvesting-or-scraping-email-addresses-from-web-pages/

Monday, 27 May 2013

Social Engineering – Scraping Data from Linkedin

Summary: A method and scripts to grab bulk data from Linkedin profiles and format it, using Burpsuite, curl, grep and cut. In this case to create a username list for identifying emails and domain accounts.

Foundation:
I was performing a relatively unique task for a social engineering engagement for a client. Normally I’ll just receive a list of email accounts and/or phone numbers of specific users the client wishes to test. In this case they didn’t want to provide ANY information at all. They wanted to see what I would be able to find and then target those users.

I started with the usual google searches looking for pertinent data and found a little. Used metagoofil and theHarvester as well, which turned up about 20 valid accounts. During my googling I found a very interesting portal page that allowed users to reset their domain passwords. I wasn’t interested in brute forcing any accounts (yet), but was able to use the functionality to test for valid accounts. I browsed to a webpage detailing some of the executives at the company and tried varying combinations of their names to find the format they used to create accounts. It turned out to be first initial last name, not surprisingly.

I then turned my attention to Linkedin and found over 1800 existing employees. If I could just grab all the names of employees and then format them I could then fire this list of usernames at the portal page to get a large list of valid user accounts. How best to do this?

Unfortunately Linkedin is one of the worst designed websites for automating this. If I were able to change the number of results per page I could simply do this manually and it wouldn’t take long. For example if I could return 100 results per page that would only be 18 pages to save manually and then grep out the profile names. That wouldn’t be so bad to save 18 pages manually. Unfortunately Linkedin has it hardcoded that you can only view 10 results per page. It looks like the limit might be 25 results for the Linkedin API, but the actual website appears to be limited to 10 per page. That means I’d have to browse to and save 180 pages manually, too much work. Thus trying to automate it with a script to crawl through each page, saving the output, looked like the best option.

To do this I used the intruder module of Burp Suite. I also needed a paid account for Linkedin, otherwise you would just see their first name and last initial. I borrowed an account (legitimately) from a friend and logged into Linkedin. This captured the request using the proxy intercept feature. I found the request for the search results page in history, right clicked and chose ‘send to intruder’.

On the positions tab for intruder you can see the HTTP request from the client. There are many variables as part of this GET request so the first step is to remove all of them with the ‘Clear §’ button. This removes all the variables that intruder will manipulate. Next select the page_num variable and select the ‘Add §’button.
Note that I changed the variable Keyword=ORG_NAME to protect the client, in reality it was just the organizations name. The attack type doesn’t necessarily matter for this test because we’re only manipulating a single variable, for the difference between the attack types check out the portswigger website.

Now select the payloads tab and choose numbers in the payload set dropdown. This section is pretty self explanatory. We want the page numbers to walk through every number from 1 to 180 and the step defines how much it increments each time. Once you’re ready click Intruder -> Start Attack.

Once the attack has completed you can highlight all the requests, right click and choose ‘save selected items’. Choose a folder and all the contents of the requests will be saved in one file. This works perfectly for what we’re trying to do as we can simply grep out the first and last name.


Source: http://twrightson.wordpress.com/2012/08/05/social-engineering-scraping-data-from-linkedin/

Thursday, 16 May 2013

Scraping LinkedIn Public Profiles for Fun and Profit

Reconnaissance and Information Gathering is a part of almost every penetration testing engagement. Often, the tester will only perform network reconnaissance in an attempt to disclose and learn the company's network infrastructure (i.e. IP addresses, domain names, and etc), but there are other types of reconnaissance to conduct, and no, I'm not talking about dumpster diving. Thanks to social networks like LinkedIn, OSINT/WEBINT is now yielding more information. This information can then be used to help the tester test anything from social engineering to weak passwords.

In this blog post I will show you how to use Pythonect to easily generate potential passwords from LinkedIn public profiles. If you haven't heard about Pythonect yet, it is a new, experimental, general-purpose dataflow programming language based on the Python programming language. Pythonect is most suitable for creating applications that are themselves focused on the "flow" of the data. An application that generates passwords from the employees public LinkedIn profiles of a given company - have a coherence and clear dataflow:

(1) Find all the employees public LinkedIn profiles → (2) Scrap all the employees public LinkedIn profiles → (3) Crunch all the data into potential passwords

Now that we have the general concept and high-level overview out of the way, let's dive in to the details.

Finding all the employees public LinkedIn profiles will be done via Google Custom Search Engine, a free service by Google that allows anyone to create their own search engine by themselves. The idea is to create a search engine that when searching for a given company name - will return all the employees public LinkedIn profiles. How? When creating a Google Custom Search Engine it's possible to refine the search results to a specific site (i.e. 'Sites to search'), and we're going to limit ours to: linkedin.com. It's also possible to fine-tune the search results even further, e.g. uk.linkedin.com to find only employees from United Kingdom.

The access to the newly created Google Custom Search Engine will be made using a free API key obtained from Google API Console. Why go through the Google API? because it allows automation (No CAPTCHA's), and it also means that the search-result pages will be returned as JSON (as oppose to HTML). The only catch with using the free API key is that it's limited to 100 queries per day, but it's possible to buy an API key that will not be limited.

Scraping the profiles is a matter of iterating all over the hCards in all the search-result pages, and extracting the employee name from each hCard. Whats is a hCard? hCard is a micro format for publishing the contact details of people, companies, organizations, and places. hCard is also supported by social networks such as Facebook, Google+, LinkedIn and etc. for exporting public profiles. Google (when indexing) parses hCard, and when relevant, uses them in search-result pages. In other words, when search-result pages include LinkedIn public profiles, it will appear as hCards, and could be easily parsed.

Let's see the implementation of the above:

#!/usr/bin/python
#
# Copyright (C) 2012 Itzik Kotler
#
# scraper.py is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# scraper.py is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with scraper.py.  If not, see <http://www.gnu.org/licenses/>.

"""Simple LinkedIn public profiles scraper that uses Google Custom Search"""

import urllib
import simplejson


BASE_URL = "https://www.googleapis.com/customsearch/v1?key=<YOUR GOOGLE API KEY>&cx=<YOUR GOOGLE SEARCH ENGINE CX>"


def __get_all_hcards_from_query(query, index=0, hcards={}):

    url = query

    if index != 0:

        url = url + '&start=%d' % (index)

    json = simplejson.loads(urllib.urlopen(url).read())

    if json.has_key('error'):

        print "Stopping at %s due to Error!" % (url)

        print json

    else:

        for item in json['items']:

            try:

                hcards[item['pagemap']['hcard'][0]['fn']] = item['pagemap']['hcard'][0]['title']

            except KeyError as e:

                pass

        if json['queries'].has_key('nextPage'):

            return __get_all_hcards_from_query(query, json['queries']['nextPage'][0]['startIndex'], hcards)

    return hcards


def get_all_employees_by_company_via_linkedin(company):

    queries = ['"at %s" inurl:"in"', '"at %s" inurl:"pub"']

    result = {}

    for query in queries:

        _query = query % company

        result.update(__get_all_hcards_from_query(BASE_URL + '&q=' + _query))

    return list(result)
Replace <YOUR GOOGLE API KEY> and <YOUR GOOGLE SEARCH ENGINE CX> in the code above with your Google API Key and Google Search Engine CX respectively, save it to a file called scraper.py, and you're ready!

To kick-start, here is a simple program in Pythonect (that utilizes the scraper module) that searchs and prints all the Pythonect company employees full names:
1
   
"Pythonect" -> scraper.get_all_employees_by_company_via_linkedin -> print
The output should be:
1
   
Itzik Kotler
In my LinkedIn Profile, I have listed Pythonect as a company that I work for, and since no one else is working there, when searching for all the employees of Pythonect company - only my LinkedIn profile comes up.
For demonstration purposes I will keep using this example (i.e. "Pythonect" company, and "Itzik Kotler" employee), but go ahead and replace Pythonect with other, more popular, companies names and see the results.

Now that we have a working skeleton, let's take its output and start crunching it. Keep in mind that every "password generation forumla" is merely a guess. The examples below are only a sampling of what can be done. There are, obviously many more possibilities and you are encouraged to experiment. But first, let's normalize the output - this way it's going to be consistent before operations are performed on it:
1
   
"Pythonect" -> scraper.get_all_employees_by_company_via_linkedin -> string.lower(''.join(_.split()))
The normalization procedure is short and simple: convert the string to lowercase and remove any spaces, and so the output should be now:
1
   
itzikkotler
As for data manipulation, out of the box (Thanks to The Python Standard Library) we've got itertools and it's combinatoric generators. Let's start by applying itertools.product:
1
   
"Pythonect" -> scraper.get_all_employees_by_company_via_linkedin -> string.lower(''.join(_.split())) -> itertools.product(_, repeat=4) -> print
The code above will generate and print every 4 characters password from the letters: i, t, z, k, o, t, l , e, r. However, it won't cover passwords with uppercase letters in it. And so, here's a simple and straightforward implementation of a cycle_uppercase function that cycles the input letters yields a copy of the input with letter in uppercase:
   
def cycle_uppercase(i):
    s = ''.join(i)
    for idx in xrange(0, len(s)):
        yield s[:idx] + s[idx].upper() + s[idx+1:]
To use it, save it to a file called itertools2.py, and then simply add it to the Pythonect program after the itertools.product(_, repeat=4) block, as follows:

   
"Pythonect" -> scraper.get_all_employees_by_company_via_linkedin \
    -> string.lower(''.join(_.split())) \
        -> itertools.product(_, repeat=4) \
            -> itertools2.cycle_uppercase \
                -> print
Now, the program will also cover passwords that include a single uppercase letter in it. Moving on with the data manipulation, sometimes the password might contain symbols that are not found within the scrapped data. In this case, it is necessary to build a generator that will take the input and add symbols to it. Here is a short and simple generator implemented as a Generator Expression:
1
   
[_ + postfix for postfix in ['123','!','$']]
To use it, simply add it to the Pythonect program after the itertools2.cycle_uppercase block, as follows:
   
"Pythonect" -> scraper.get_all_employees_by_company_via_linkedin \
    -> string.lower(''.join(_.split())) \
        -> itertools.product(_, repeat=4) \
            -> itertools2.cycle_uppercase \
                -> [_ + postfix for postfix in ['123','!','$']] \
                    -> print
The result is that now the program adds the strings: '123', '!', and '$' to every generated password, which increases the chances of guessing the user's right password, or not, depends on the password :)

To summarize, it's possible to take OSINT/WEBINT data on a given person or company and use it to generate potential passwords, and it's easy to do with Pythonect. There are, of course, many different ways to manipulate the data into passwords and many programs and filters that can be used. In this aspect, Pythonect being a flow-oriented language makes it easy to experiment and research with different modules and programs in a "plug and play" manner.

Source: http://blog.ikotler.org/2012/12/scraping-linkedin-public-profiles-for.html

Monday, 6 May 2013

Data Extraction using product Catalog data extractor

Many sales and marketing business users would be looking to get fast solutions and data information related to various companies products. One of the ways to get quick and accurate information is by way of product Catalog data extractor tool. This tool is available at www.websitescraper.com and is widely used. Information about the product features, its colors, sizes available, customer reviews and feedback can be extracted using this product catalog data extractor tool. This information will prove to be quite handy in deciding the current product portfolio of a company. This tool can extract mass data in quick time and the data can be interpreted to build useful report and for strategic decision making. The data can be stored in various databases format such as MS Access, MS Excel, HTML, CSV files etc.

Product Catalog website scraper and its functionality

Product Catalog website scraper tool is one such data analytics tool that can be used to gather information related to the business products and services. The tool is featured at Scraping Intelligence and is capable enough to copy bulk amount of data from the parent website and present and compile this information into a screen format. The product catalog website scraper tool is thus used by many business experts who are into the marketing field and looking to track the behavior of the customers towards a particular group of products. Using this tool, they can get the accurate information in no span of time.

Quick Data Retrieving from product Catalog screen scraper

Scraping Intelligence also has an online data analytics tool that works exceptionally well for the image and picture data. The tool product Catalog screen scraper is very effective in collecting and copying the picture data from the product catalog databases and web pages and present and interpret this information in the way specified by the user. Many a times, the image data is more handy and useful than the text data. In such a case, this product Catalog screen scraper tool will prove to be quite handy and useful for the users.

Source: http://www.websitescraper.com/scrape-data-for-creating-product-catalogues.php

Friday, 3 May 2013

Web data Scraping is the most effective offers

Every growing business needs a way to reduce, significantly, the time and financial resources that it dedicates to handling its growing informational need. Web Data Scraping offers the most effective yet very economical solution to the data loads that your company has to handle constantly. The variety of handling services from this company includes data scraping, web scraping and website scraping.

The company offers the most valuable and efficient website data scraping software that will enable you to scrape out all the relevant information that you need from the World Wide Web. The extracted information is valuable to a variety of production, consumption and service industries. For comparison of prices online, website change detection, research, weather data monitoring, web data integration and web mash up and many more uses, the web scraping software from Web Data Scraping is the best bet you can find from the web scraping market.

The software that this company offers will handle all the web harvesting and website scraping in a manner that more of simulates a human exploration of the websites you want to scrape from. A high level HTTP and fully embedding popular browsers like Mozilla and the exclusive ones work with web data extraction from Webdatascraping.us

The data scraping technology from Web Data Scraping has the capability to bypass all the technical measures that the institutional owners of the websites implement to stop bots. Imagine paying for web scraping software that cannot bypass blockade by these websites from which you need to use their information. This company guarantees that not any excess traffic monitoring, IP address blockade or additions of entries like robots.txt will be able to prevent its functioning. In addition, there are many website scraping crawlers that are easily detected and blocked by commercial anti-bot tools like distil, sentor and siteblackbox. Web Data Scraping is not preventable with any of these and most importantly with verification software’s like catches.

We have expertise in following listed services for which you can ask us.
- Contact Information Scraping from Website.
- Data Scraping from Business Directory – Yellow pages, Yell, Yelp, Manta, Super pages.
- Email Database Scraping from Website/Web Pages.
- Extract Data from EBay, Amazon, LinkedIn, and Government Websites.
- Website Content, Metadata scraping and Information scraping.
- Product Information Scraping – Product details, product price, product images.
- Web Research, Internet Searching, Google Searching and Contact Scraping.
- Form Information Filling, File Uploading & Downloading.
- Scraping Data from Health, Medical, Travel, Entertainment, Fashion, Clothing Websites.

Every company or organization, survey and market research for strategic decisions plays an important role in the process of data extraction and Web technology. Important instruments that relevant data and information for your personal or commercial use scraping. Many companies paste manually copying data from Web pages people, it is time to try and wastage as a result, the process is too expensive, that it's because the resources spent less and collect data from the time taken to collect data is very reliable.

Nowadays, a CSV file, a database, an XML file that thousands of websites and crop-specific crawl your pages can have different data mining companies effective web information technology, or other source data scraping is saved with the required format. Collect data and process data mining stored after the lies hidden patterns and trends can be used to understand patterns in data correlations and delete; Policy formulated and decisions. Data is stored for future use.

Source: http://www.selfgrowth.com/articles/web-data-scraping-is-the-most-effective-offers

Note:

Delta Ray is experienced web scraping consultant and writes articles on linkedin email scraping, linkedin profile scraping, tripadvisor data scraping, amazon data scraping, yellowpages data scraping, product information scraping and yellowpages data scraping.