Big Data Lab

  • Description


    The Big Data Lab was funded in 2014 by a New York state grant of $1 million. The purpose of the new Big Data Lab is to educate students in all aspects of large and distributed information systems (e.g. system development, testing, maintenance, data security and privacy, data integration, networking, cyber-security and application development) and, thus prepare them for highly-skilled jobs in emerging and fast growing IT industries such as cloud computing, health-care informatics, finance, data integration and data analytics.

    The lab consists of a server room and a computer lab of 20 workstations. The equipment in the server room is organized as a VMWARE ESXi cluster build out of 21 individual servers. The cluster has an overall RAM of 3 TB, 420 CPU cores or 840 threads and a total hard-drive capacity of 1,000 TB. The server room has a cooling capacity of 12 Tons (~ 12 large home AC), a power capacity of 60 KW (~ five homes with 100Amps), 1/2 km of networking cable.

    Each of the 21 servers in the VMWARE ESXi Cluster consists of:

    • IBM X3650 M4 Big Data model
    • 2 x 8 cores Xeon CPU @2.2 GHz
    • 128GB RAM
    • 9 x 4TB HDD (RAID 6)
    • 2 x 10GB Network Card
    • 4 x 1GB Network Card
    • OS: VMWARE ESXi 6.0 Hypervisor
  • Courses


    Big Data Lab

    CSC143: Semantic Web
    Faculty: Dr. Steven Lindo
    Spring 2017

    The Semantic Web is an evolution of the current WWW where data is represented as meaningful knowledge. The crux of the Semantic Web is in semantic representation and reasoning of data using description logic ontologies, which is particularly useful for classifying large amounts of unstructured data. Ontology reasoning of big data requires ample storage and processing power. The Big Data Lab contains 100TB cloud storage and 420TB of storage in data servers that students will be using to build semantic web applications.

    CSC145R: Cloud Computing for Big Data
    Faculty: Dr. Bo Tang
    Spring 2017

    This course will provide a comprehensive study of Cloud concepts and technologies across the various Cloud service models including Infrastructure as a Services (IaaS), Platform as a Services (PaaS), Software as a Services (SaaS), and Business Process as a Process (BPaaS). It will examine Cloud computing in detail and offer a hands-on study of using Cloud computing for big data analytics, which focuses on data mining and knowledge discovery from the big data. Fundamental security models and associated challenges will be introduced. Students will complete a project and present it as part of the course.

    CSC175: Computer Networking
    Faculty: Dr. Chuck Liang
    Spring 2017

    A technical introduction to data communication. Topics include the OSI Reference Model, layer services, protocols, LANs, packet switching and X.25, ISDN, File transfer, virtual terminals, system management and distributed processing.

    The Big Data Lab hosts a dedicated server and switch for teaching students basic networking concepts, which are fundamental for big data processing. Each student workstation contains three network interface cards, which will allow a high degree of flexibility in network configuration. Students will engage in networking experiments such as subnet mapping and packet tracing. A special dedicated ISP line is also available to isolate the lab from the rest of campus so that the students' activities will have no adverse effect.

    CSC 190 Software Engineering
    Faculty: Dr. Xiang Fu
    Spring 2017

    Students study the nature of the program development task when many people, modules and versions are involved in designing, developing and maintaining a large program or system. Issues addressed include program design, specification, version control, cost estimation and management. Students work in small teams on the cooperative examination and modification of existing systems. The course has an oral communication component including group and individual presentations.

    Mini-assignment: your team will have access to a Windows 2008 R2 Server. Using this server, you and your colleagues will design, implement, test, deliver, and publish a collection of web services for an educational stock exchange platform named "Hofstra Stock Exchange" (HSE). HSE has to provide basic user management and stock trading functions. It should provide stock history query functions for all NYSE stocks from 1/1/1980. You have to optimize the performance of your web services and provide a complete functional and performance testing report at the end of the semester.

  • Faculty


    Big Data Lab

    Xiang Fu
    Associate Professor
    (516) 463-4787
    103 Adams Hall
    Research Areas: Software Engineering, Formal Verification, Model Checking, Information Security, Web Services
    E-mail | WWW | Bio

    Chuck C. Liang
    (516) 463-5559
    102 Adams Hall
    Research Areas: Programming languages, Type theory, Compilers
    E-mail | WWW | Bio

  • Ongoing Research

    Ongoing Research

    Big Data Lab

    Research Project funded by NSF
    Title: WISE Guys and Gals - Boys & Girls as WISEngineering STEM Learners
    Faculty: Dr. Xiang Fu, Dr. David Burghardt
    Student: Tyler Befferman

    WISEngineering is a web-based educational system that supports NSF-AISL 1422436  "WISE Guys and Gals - Boys & Girls as WISEngineering STEM Learners ". The system integrates various advanced features such as user behavior tracking and automated grading that supports an engineering curriculum in an informal learning environment. A highly reliable data storage system (based on HBase) is being developed to store the huge amount of user behavior data generated by the system every day. Another automated grading module, based on Hadoop and EDX EASE, is used to train, calibrate an automatic grading engine that assesses student performance. The project uses an NAS server of 10TB and runs on a cluster of 6 nodes.

    Master Capstone Project
    Title: Enhancing Malware Trace Mining With Cloud Caching
    Student: John Cammarano
    Fall 2014
    Faculty Advisor: Dr. Xiang Fu


    Malware is progressively increasing in sophistication and in the most recent cases, is capable of detecting its run-time environment. This poses a significant issue for traditional analysis methods which use emulators and virtual systems to execute the malware and observe its behavior. A new analysis tool EVMine analyzes this new type of malware by caching large data files on the hard disk which contain slices of traces of malware. To improve the overall performance of the tool, core functionality of the caching mechanism was altered to provide capabilities similar to Google BigTable with the use of Apache HBase.

    Senior Project:
    Title: A Hadoop Based Trading Platform and Financial Optimization System
    Student: Steve Spano
    Faculty: Dr. Xiang Fu
    Fall 2014


    Brief description: We propose to develop an educational stock trading platform and a research tool for optimizing the parameters of financial models. The system leverages the distributed HBASE database for storing the vast amount of historical trading data. A research engine is based on Hadoop, which accepts upload of bytecode of financial models implemented using Java. By performing reflection analysis of a financial model program, the research engine automatically extracts its parameters and runs hundreds of parallel threads against the historical data to find the optimized parameters.

  • Lab Resources

    Lab Resources

    VMware vSphere

    The datacenter hosts a cloud computing infrastructure supported supported by a VMware vSphere cluster of 21 servers. VMware vSphere is a virtualization platform for running virtual machines (VMs) in a large scale computing environment. The vSphere Client is used to access a vCenter Server for accessing and managing the virtual machines.

    How to access virtual machines in the Big Data Lab

    VPN remote access

    The Computer Science VPN allows students and faculty to access resources of the Big Data Lab from off campus over the Internet.  An OpenVPN client and a client profile from are required to connect to the VPN.

    More about Computer Science Lab Resources