Skip Navigation Links

Project Information

DENSE AND SPARSE METHODS IN HIGH-DIMENSIONAL DATA ANALYSIS

Agency:
NSF

National Science Foundation

Project Number:
1208785
Contact PI / Project Leader:
DICKER, LEE
Awardee Organization:
RUTGERS THE ST UNIV OF NJ NEW BRUNSWICK

Description

Abstract Text:
Many methods for high-dimensional data analysis begin with the assumption that the parameter of interest is, in some sense, sparse. Furthermore, the performance of many of these methods depends on the sparsity of the underlying parameters. However, statistical methods for checking sparsity assumptions and determining the implications of the absence or near-absence of sparsity are lacking. The driving goal of this project is to develop practical statistical tools for identifying situations where the relevant parameters are in fact sparse, or where sparse methods for high-dimensional data analysis may be applied effectively. Problems considered in this project will primarily be studied within the context of the linear model and the Gaussian location model. Methods will be assessed by decision theoretic-like criteria (e.g. asymptotic minimaxity). A null model based on dense (non-sparse) signals and dense estimation and prediction methods will be developed and thoroughly studied. This will provide a rich framework for sparsity testing, where the aim is to identify settings in which sparse methods are likely to be successful. Specific sparsity testing procedures will be proposed and analyzed.

High-dimensional data analysis is one of the most active areas of current statistical research. Much of this research has been driven by technological advances that have enabled researchers to collect vast datasets with relative ease in a variety of scientific disciplines, including astrophysics, geological sciences, molecular biology, and genomics. In high-dimensional datasets, many features are measured for each unit of observation (e.g. thousands of gene expression levels may be measured for each individual in a genomic study). Sparsity plays a major role in much of the research on high-dimensional data analysis. Broadly speaking, sparsity measures the degree to which a specified outcome may be described by relatively few features. Sparse methods for high-dimensional data analysis attempt to leverage sparsity in the underlying dataset and have proven to be very effective in many applications, especially in engineering and signal processing. On the other hand, the performance of sparse methods has been more mixed in other important applications where high-dimensional data are abundant, such as genomics. In this project, the investigator will develop statistical methods for characterizing and identifying situations where sparse methods can be successfully applied. This will be achieved by developing tools for determining the level of sparsity in high-dimensional datasets. These methods, when applied to a given dataset, will help researchers determine the validity of subsequent statistical analyses and the potential benefits of using sparse methods for these analyses. This research is likely to have significant implications for understanding reproducibility in high-dimensional data analysis and broad applications in the analysis of genomic data. The methods developed during the course of this project will be utilized in collaborative work with highly experienced researchers in genomics.
Project Terms:
Area; Automobile Driving; base; computerized data processing; Data; Data Analyses; Data Set; Discipline; Engineering; experience; Gene Expression; Genomics; Goals; Individual; interest; Linear Models; Location; Measures; Methods; Modeling; Molecular Biology; Outcome; Performance; Play; Procedures; Relative (related person); Reproducibility; Research; Research Personnel; Role; Science; Signal Transduction; Specific qualifier value; Statistical Methods; Testing; tool; Work

Details

Contact PI / Project Leader Information:
Name:  DICKER, LEE
Other PI Information:
Not Applicable
Awardee Organization:
Name:  RUTGERS THE ST UNIV OF NJ NEW BRUNSWICK
City:  NEW BRUNSWICK    
Country:  UNITED STATES
Congressional District:
State Code:  NJ
District:  06
Other Information:
Fiscal Year: 2012
Award Notice Date:
DUNS Number: 001912864
Project Start Date: 01-Aug-2012
Budget Start Date:
CFDA Code: 47.049
Project End Date: 31-Jul-2015
Budget End Date:
Agency: ?

Agency: The entity responsible for the administering of a research grant, project, or contract. This may represent a federal department, agency, or sub-agency (institute or center). Details on agencies in Federal RePORTER can be found in the FAQ page.

National Science Foundation
Project Funding Information for 2012:
Year Agency

Agency: The entity responsible for the administering of a research grant, project, or contract. This may represent a federal department, agency, or sub-agency (institute or center). Details on agencies in Federal RePORTER can be found in the FAQ page.

FY Total Cost
2012 NSF

National Science Foundation

$159,995

Results

i

It is important to recognize, and consider in any interpretation of Federal RePORTER data, that the publication and patent information cannot be associated with any particular year of a research project. The lag between research being conducted and the availability of its results in a publication or patent award varies substantially. For that reason, it's difficult, if not impossible, to associate a publication or patent with any specific year of the project. Likewise, it is not possible to associate a publication or patent with any particular supplement to a research project or a particular subproject of a multi-project grant.

ABOUT FEDERAL REPORTER RESULTS

Publications: i

Click on the column header to sort the results

PubMed = PubMed PubMed Central = PubMed Central Google Scholar = Google Scholar

Patents: i

Click on the column header to sort the results

Similar Projects

Download Adobe Acrobat Reader:Adobe Acrobat VERSION: 3.41.0 Release Notes
Back to Top