Monday 27 May 2013

Social Engineering – Scraping Data from Linkedin

Summary: A method and scripts to grab bulk data from Linkedin profiles and format it, using Burpsuite, curl, grep and cut. In this case to create a username list for identifying emails and domain accounts.

Foundation:
I was performing a relatively unique task for a social engineering engagement for a client. Normally I’ll just receive a list of email accounts and/or phone numbers of specific users the client wishes to test. In this case they didn’t want to provide ANY information at all. They wanted to see what I would be able to find and then target those users.

I started with the usual google searches looking for pertinent data and found a little. Used metagoofil and theHarvester as well, which turned up about 20 valid accounts. During my googling I found a very interesting portal page that allowed users to reset their domain passwords. I wasn’t interested in brute forcing any accounts (yet), but was able to use the functionality to test for valid accounts. I browsed to a webpage detailing some of the executives at the company and tried varying combinations of their names to find the format they used to create accounts. It turned out to be first initial last name, not surprisingly.

I then turned my attention to Linkedin and found over 1800 existing employees. If I could just grab all the names of employees and then format them I could then fire this list of usernames at the portal page to get a large list of valid user accounts. How best to do this?

Unfortunately Linkedin is one of the worst designed websites for automating this. If I were able to change the number of results per page I could simply do this manually and it wouldn’t take long. For example if I could return 100 results per page that would only be 18 pages to save manually and then grep out the profile names. That wouldn’t be so bad to save 18 pages manually. Unfortunately Linkedin has it hardcoded that you can only view 10 results per page. It looks like the limit might be 25 results for the Linkedin API, but the actual website appears to be limited to 10 per page. That means I’d have to browse to and save 180 pages manually, too much work. Thus trying to automate it with a script to crawl through each page, saving the output, looked like the best option.

To do this I used the intruder module of Burp Suite. I also needed a paid account for Linkedin, otherwise you would just see their first name and last initial. I borrowed an account (legitimately) from a friend and logged into Linkedin. This captured the request using the proxy intercept feature. I found the request for the search results page in history, right clicked and chose ‘send to intruder’.

On the positions tab for intruder you can see the HTTP request from the client. There are many variables as part of this GET request so the first step is to remove all of them with the ‘Clear §’ button. This removes all the variables that intruder will manipulate. Next select the page_num variable and select the ‘Add §’button.
Note that I changed the variable Keyword=ORG_NAME to protect the client, in reality it was just the organizations name. The attack type doesn’t necessarily matter for this test because we’re only manipulating a single variable, for the difference between the attack types check out the portswigger website.

Now select the payloads tab and choose numbers in the payload set dropdown. This section is pretty self explanatory. We want the page numbers to walk through every number from 1 to 180 and the step defines how much it increments each time. Once you’re ready click Intruder -> Start Attack.

Once the attack has completed you can highlight all the requests, right click and choose ‘save selected items’. Choose a folder and all the contents of the requests will be saved in one file. This works perfectly for what we’re trying to do as we can simply grep out the first and last name.


Source: http://twrightson.wordpress.com/2012/08/05/social-engineering-scraping-data-from-linkedin/

No comments:

Post a Comment