Record Display for the EPA National Library Catalog

RECORD NUMBER: 16 OF 66

Main Title Frontiers in massive data analysis /
Publisher The National Academies Press,
Year Published 2013
OCLC Number 857812965
ISBN 9780309287784; 0309287782
Subjects Mathematical statistics--Data processing ; Social sciences--Statistical methods ; Big data ; Data Mining ; Datenanalyse ; Massendaten
Internet Access
Description Access URL
http://www.nap.edu/catalog.php?record_id=18374
Holdings
Library Call Number Additional Info Location Last
Modified
Checkout
Status
EKDM  QA276.4.F76 2013 CEMM/EPD Library/Athens,GA 09/15/2014 STATUS
ELBM  QA276.4.F76 2013 AWBERC Library/Cincinnati,OH 07/13/2022
Collation xiii, 176 pages ; 23 cm
Notes
Includes bibliographical references.
Contents Notes
1. Introduction. The challenge -- What has changed in recent years? -- Organization of this report -- 2. Massive data in science, technology, commerce, national defense, telecommunications, and other endeavors. Where are massive data appearing? -- Challenges to the analysis of massive data -- Trends in massive data analysis -- Examples -- 3. Scaling the infrastructure for data management. Scaling the number of data sets -- Scaling computing technology through distributed and parallel systems -- Trends and future research -- 4. Temporal data and real-time algorithms. Introduction -- Data acquisition -- Data processing, representation, and inference -- System and hardware for temporal data sets -- Challenges -- 5. Large-scale data representations. Overview -- Goals of data representation -- Challenges and future directions -- 6. Resources, trade-offs, and limitations. Introductions -- Relevant aspects of theoretical compurter science -- Gaps and opportunities -- 7. Building models from massive data Introduction to statistical models. Data cleaning -- Classes of models -- Model tuning and evaluation -- Challenges -- 8. Sampling and massive data Common techniques of statistical sampling. Challenges when sampling from massive data -- 9. Human interaction with data Introduction. State of the art -- Hybrid human/computer data analysis -- Opportunities, challenges and directions -- 10. The seven computational giants of massive data analysis. Basic statistics -- Generalized n-body problems -- Graph-theoretic computations -- Linear algebraic computations -- Optimizations -- Integration -- Alignment problems -- Discussion -- 11. Conclusions --Appendixes. A. Acronyms -- B. Biographical sketches of committee members. "With information available from Internet sites around the globe and flowing over communication networks connecting billions of devices, today's society has access to an enormous amount of data. Scientific communities and the defense and intelligence enterprise are also generating massive amounts of data from experiments, observations, and numerical simulations. Some Internet-based companies are dealing with data measured in exabytes (a billion billion bytes), and many other sources are producing terabytes or even petabytes of data. While systems have been developed to store and manage such massive amounts of data, some of which streams by and is only examined "on the fly," our ability to infer knowledge from data at this scale is limited. A major challenge is developing statistically well-founded procedures that allow us to control the inevitable errors; many traditional tools of data analysis are not feasible at this scale. Frontiers in Massive Data Analysis descrives the cross-disciplinary skill set that data analysts need to address the challenges of exploiting big data. It identififies gaps in current capabilities and recommends promising research directions in multiple component areas, ranging from data representation to methods for including humans in the data-analysis loop. The report also proposes a list of key computational problems, the "seven computational giants" of massive data analysis"--Back cover.