Metadata
Homework1
Clean & Document dataset
Homework should be committed to your github.umn.edu repository by 4:30 Monday Feb 19.
You will work in the named repository (ENT5920-Lastname) you created in Week 3.
Using a dataset of your own, preferably one you will be using in your
research, clean and document the data as follows:
-
Choose a limited dataset, 5 - 10 columns of data. Number of
rows is less important except as it impedes your ability to post
the data to github.
- Create an R script which does the following:
- reads the data in, using relativized (not hard coded) paths
- gives a summary table of the number of identifiers
- places, dates, species if applicable
- produces numerical summaries of variables
- ranges, central tendencies, quartiles
- histograms of values
- correlation among variables
- cleans any errors found, preferably with generic function
- outputs a clean dataset to a new location with datestamp
- Create a metadata record for this dataset. The format of
the metadata document should follow a standard for your
field, if possible. If not, associate the metadata with
the data by adding it to the header of the data document. The metadata record
should include:
- The project, and investigator(s) responsible for the data
- The geographic and temporal restriction of the data
- a short (1 paragraph) abstract of the purpose & methods
- a data dictionary with column definitions
-
Commit the data, QC script, and metadata documents to your
local git repository.
-
Push the changes to your remote (github.umn.edu) repository.
- Ensure both Dan (dcarivea) and Eric (elind) are added as collaborators
to your repository.