The Big Data Lab was funded in 2014 by a New York state grant of $1 million. The purpose of the Big Data Lab is to educate students in all aspects of large and distributed information systems (e.g., system development, testing, maintenance, data security and privacy, data integration, networking, cybersecurity, and application development) and prepare them for highly skilled jobs in emerging and fast growing IT industries such as cloud computing, health care informatics, finance, data integration, and data analytics.
The lab consists of a server room and a computer lab of 20 workstations. The equipment in the server room is organized as a VMWARE ESXi cluster built out of 21 individual servers. The cluster has an overall RAM of 3 TB, 420 CPU cores or 840 threads, and a total hard drive capacity of 1,000 TB. The server room has a cooling capacity of 12 Tons (~ 12 large home AC), a power capacity of 60 KW (~ five homes with 100Amps), 1/2 km of networking cable.
Each of the 21 servers in the VMWARE ESXi Cluster consists of:
- IBM X3650 M4 Big Data model
- 2 x 8 cores Xeon CPU @2.2 GHz
- 128GB RAM
- 9 x 4TB HDD (RAID 6)
- 2 x 10GB Network Card
- 4 x 1GB Network Card
- OS: VMWARE ESXi 6.0 Hypervisor
CSC143: Semantic Web
Faculty: Dr. Steven Lindo
The Semantic Web is an evolution of the current WWW where data is represented as meaningful knowledge. The crux of the Semantic Web is in semantic representation and reasoning of data using description logic ontologies, which is particularly useful for classifying large amounts of unstructured data. Ontology reasoning of big data requires ample storage and processing power. The Big Data Lab contains 100TB cloud storage and 420TB of storage in data servers that students will be using to build Semantic Web applications.
CSC145R: Cloud Computing for Big Data
Faculty: Dr. Bo Tang
This course will provide a comprehensive study of cloud concepts and technologies across the various cloud service models including Infrastructure as a Services (IaaS), Platform as a Services (PaaS), Software as a Services (SaaS), and Business Process as a Process (BPaaS). It will examine cloud computing in detail and offer a hands-on study of using cloud computing for big data analytics, which focuses on data mining and knowledge discovery from the big data. Fundamental security models and associated challenges will be introduced. Students will complete a project and present it as part of the course.
CSC175: Computer Networking
Faculty: Dr. Chuck Liang
A technical introduction to data communication. Topics include the OSI Reference Model, layer services, protocols, LANs, packet switching and X.25, ISDN, file transfer, virtual terminals, system management, and distributed processing.
The Big Data Lab hosts a dedicated server and switch for teaching students basic networking concepts, which are fundamental for big data processing. Each student workstation contains three network interface cards, which will allow a high degree of flexibility in network configuration. Students will engage in networking experiments such as subnet mapping and packet tracing. A special dedicated ISP line is also available to isolate the lab from the rest of campus so that the students' activities will have no adverse effect.
CSC 190 Software Engineering
Faculty: Dr. Xiang Fu
Students study the nature of the program development task when many people, modules, and versions are involved in designing, developing, and maintaining a large program or system. Issues addressed include program design, specification, version control, cost estimation, and management. Students work in small teams on the cooperative examination and modification of existing systems. The course has an oral communication component including group and individual presentations.
Mini Assignment: Your team will have access to a Windows 2008 R2 Server. Using this server, you and your colleagues will design, implement, test, deliver, and publish a collection of web services for an educational stock exchange platform named "Hofstra Stock Exchange" (HSE). HSE has to provide basic user management and stock trading functions. It should provide stock history query functions for all NYSE stocks from 1/1/1980. You have to optimize the performance of your web services and provide a complete functional and performance testing report at the end of the semester.
Research Project funded by NSF
Title: WISE Guys and Gals - Boys & Girls as WISEngineering STEM Learners
Faculty: Dr. Xiang Fu, Dr. David Burghardt
Student: Tyler Befferman
WISEngineering is a web-based educational system that supports NSF-AISL 1422436 "WISE Guys and Gals - Boys & Girls as WISEngineering STEM Learners." The system integrates various advanced features such as user behavior tracking and automated grading that supports an engineering curriculum in an informal learning environment. A highly reliable data storage system (based on HBase) is being developed to store the huge amount of user behavior data generated by the system every day. Another automated grading module, based on Hadoop and EDX EASE, is used to train and calibrate an automatic grading engine that assesses student performance. The project uses an NAS server of 10TB and runs on a cluster of six nodes.
Master Capstone Project
Title: Enhancing Malware Trace Mining With Cloud Caching
Student: John Cammarano
Faculty Advisor: Dr. Xiang Fu
Malware is progressively increasing in sophistication and, in the most recent cases, is capable of detecting its run-time environment. This poses a significant issue for traditional analysis methods, which use emulators and virtual systems to execute the malware and observe its behavior. A new analysis tool EVMine analyzes this new type of malware by caching large data files on the hard disk which contain slices of traces of malware. To improve the overall performance of the tool, core functionality of the caching mechanism was altered to provide capabilities similar to Google BigTable with the use of Apache HBase.
Title: A Hadoop Based Trading Platform and Financial Optimization System
Student: Steve Spano
Faculty: Dr. Xiang Fu
Brief description: We propose to develop an educational stock trading platform and a research tool for optimizing the parameters of financial models. The system leverages the distributed HBASE database for storing the vast amount of historical trading data. A research engine is based on Hadoop, which accepts upload of bytecode of financial models implemented using Java. By performing reflection analysis of a financial model program, the research engine automatically extracts its parameters and runs hundreds of parallel threads against the historical data to find the optimized parameters.
The datacenter hosts a cloud computing infrastructure supported supported by a VMware vSphere cluster of 21 servers. VMware vSphere is a virtualization platform for running virtual machines (VMs) in a large scale computing environment. The vSphere Client is used to access a vCenter Server for accessing and managing the virtual machines.