Undergrad Projects
From NetSysLab
If your are an undergraduate student interested on one of the following EECE496 projects, please contact Matei Ripeanu.
Contents |
Social Computing
Twitter Cyborgs
Related project: Socialbots (collaboration with LERSEE). This project was suggested by Miranda Mowbray from HP Labs.
Twitter cyborgs are automated or partially-automated accounts on the Twitter micro-blogging service that send large volumes of Twitter messages, typically for marketing purposes. Twitter cyborgs are different from Socialbots in the sense that they are not designed to pass themselves off as human beings. However, they can be disruptive even when their marketing is opt-in only.
The first goal of this project is to investigate whether tweets produced by humans are more likely to be retweeted than tweets produced by cyborgs. This involves applying a technique to detect cyborg presence in Twitter, and characterize the activity they generate (e.g., tweets and re-tweets frequency).
A second goal is to explore the degree to which users’ social importance (e.g., as measured by their TunkRank – an adaptation of the Google’s PageRank) correlates with the average retweet probability of their tweets.
A number of other, certainly more ambitious, investigations are possible once you have gathered enough data. For example: use a machine learning approach, to discover which indicators of a tweet (e.g., URL, hashtag, cyborg or human, TunkRank, "influence", number of followers, common retweeted words) are the best predictors of how often it'll be retweeted, and how good a prediction you can make by using a combination of these.
Additional reading/links: “Want to be retweeted? Large scale analytics on factors impacting retweet in Twitter network”; “Do sponsored tweets work?”; TunkRank API Twitter Cyborgs Slides; "The Twittering Machine"
Data Collection and Analysis on Mobile Social Applications
My Tracks is a popular application that enable users to record their performance while running and hiking, for example. If one considers an online social network for street runners, how would one devise an algorithm to compare tracks shared by many users? For example, suppose the goal is to find users who run on tracks of similar characteristics in the same city, but at different neighborhoods.
The goal of this project is twofold:
1. Gain experience with programming for Android by modifying MyTracks to record other "interesting" information such as the music users play while running, as to serve as an aid to find "similar" users in the runner social network.
2. Learn and implement algorithms to calculate similarity of shared tracks, and users according to their running profiles.
Analyzing Influence in Online Social Networks
Related project: Socialbots (collaboration with LERSEE).
Online social systems are commonplace in today's World Wide Web. Understanding how such networks evolve over time and/or the relation between topology characteristics and social influence is paramount to design a number of mechanisms. For instance, by detecting influential users via their position in the network, one may devise better advertisement campaign.
The goal of this project is three-fold: First, design and implement graph algorithms to compute node centrality in online social networks by using publicly available traces. Second, design and implement social influence detection algorithms. Third, analyze whether the position of an individual in the network provides information about how influential the individual can be. Depending on the outcomes, our home-made Socialbots will be enhanced with the ability to architect a social position in order to increase the expected influence they have on the network --- the next-generation online marketing tool?
Data storage systems
Increasingly scientific discoveries are driven by analysis of massively distributed bulk data. This has led to the proliferation of high-end mass storage systems and storage area clusters as storage fabric elements for supercomputing, offering excellent price/performance ratio and good storage speed, but high administrative costs. A promising alternative is to harness the collective storage potential of individual workstations much as we harness the idle CPU cycles. However, such aggregated commodity storage is prone to volatility, machine failures, performance concerns and trust issues.
The MosaStore project is an effort to aggregate space and I/O bandwidth contributions from commodity desktop storage within a domain to provide a shared storage space.
MosaStore is under active development, you may choose to design and implement one of the following features:
- Security: design and implement the security mechanism (this includes authentication and access control).
- NameSpaces: Add directories to the existing MosaStore implementation.
- User account management: Extend the MosaStore management modules with user and groups’ accounts management.
Performance evaluation of Amazon S3
Amazon’s Simple Storage Service (S3) is a storage infrastructure offering simple cost model and SOAP and HTTP interfaces. The goal of the project is to improve the early results on S3 characterization described here and here.
Web2.0 Mashups
Collecting and Visualizing Data for Research Group Summaries
Generate visual summaries for research groups collaborations (for example, similar to the ones presented here for individual researchers).
GPU-accelerated middle-ware primitives
MerkleGPU
GPU-accelerated construction and querying of Merkle trees
High-performance Computing / Bio-informatics
Parallelization of MUMmer sequence alignment tool
MUMmer is a widely used bio-informatics application, more specifically a genomic sequence alignment tool. Basically, the tool performs a large number of substring matches: a large number of short strings (called “queries”) are searched in a very long string (called “reference string” or "reference genome").
The original version of MUMmer is sequential and runs on a single core only. In this project, will implement, evaluate, and profile a parallel, multi-core version of MUMmer.
Required skills/qualifications: Familiarity with shell scripting and C language.