December 14, 2014

Stata Learning Notes

(I.) Commands:

log using Opens and names a log file to record your Stata session
log close Closes the log file of your Stata session
cd Changes the Stata working directory
cf Compares two datasets
dir Lists the contents of the current Stata working directory
infile Loads an unformatted text-format data file into memory
insheet Loads certain text-format data files into memory
describe Displays a summary of the current dataset in memory
sort Orders or re-orders the observations in the current dataset
summarize Displays descriptive summary statistics for numeric variables
list Displays values of some or all variables in the current dataset
save Saves current dataset in memory as a Stata-format dataset
use Loads a Stata-format data file into memory
label variable Assigns a label to a variable in a Stata-format dataset
label data Assigns a label to a Stata-format dataset
exit Ends a Stata session
generate Creates new numeric variables from expressions containing existing numeric variables, operators, and functions.
replace Used to modify the contents (values) of existing variables.
compare Compares two variables; reports the differences and similarities between two variables.
drop Eliminates or deletes variables or observations from the dataset in memory.
label values Assigns a single value label to a variable.
label define Assigns a value label to each distinct value of a variable.
graph bar Draws bar charts of sample means of numeric variables.
correlate Displays correlation matrix for two or more numeric variables.
tabulate Produces one-way and two-way frequency tables for categorical variables.
table Produces one-way tables of statistics for the various categories (values) of categorical variables.
codebook Displays properties of variables in the current data set.
regress Performs OLS estimation of linear regression models.
_b[varname] Contains the coefficient estimate for the regressor varname.
_se[varname] Contains the standard error of the coefficient estimate for the regressor varname.
vce Displays estimated covariance matrix of coefficient estimates.
matrix get Accesses coefficient estimates and the covariance matrix.
display Computes and displays the values of algebraic expressions.
scalar Defines the contents of scalar variables.
scalar list Lists the names and values of currently-defined scalar variables.
scalar drop Eliminates previously-defined scalars from memory.
matrix Defines matrices and performs matrix computations.
matrix list Lists contents of a vector or matrix.
predict Computes estimated Yi-values and OLS residuals.
graph twoway Draws scatterplots of sample data points and line graphs of OLS sample regression functions.
test Used after OLS estimation to compute two-tail F-tests of coefficient equality restrictions.
return list Lists all temporarily-saved results of the test command.
lincom Used after OLS estimation to compute two-tail t-tests of individual regression coefficients.

(II.) Stata Basics

Recording a Stata Session

log using XX.log
log usingXX.log, replace
log off
log on
log close

Summarizing the contents of the current dataset – describe

describe

Displaying the values of variables — list

list
list in 1/20

Calculating descriptive summary statistics – summarize

summarize
summarize price mpg weight if mpg > 20

Drawing Bar Charts – graph bar

graph bar (mean) varname, over(category)

Creating new variables from existing variables – generate

generate weightsq = weight^2
gen lnprice = ln(price)
list price price1 in 1/20
summarize price price1
compare price price1
drop price1 mpg1 weight1

Computing sample correlations and covariances – correlate

correlate price mpg weight foreign

OLS estimation of linear regression models – regress , Y = price = β + β weight + u

regress price weight

Generating and Graphing Predicted Values and Residuals After OLS Estimation

log using 351tutorial4.log
describe
summarize
codebook price weight mpg foreign
regress price weight

Calculating predicted values and residuals – predict

predict yhat
predict uhat, residuals
summarize price yhat uhat
list price yhat uhat
correlate price yhat, means
correlate weight uhat, means
correlate yhat uhat, means

(III.)A true example:

1.0 Introduction

Basic commands:

use http://www.ats.ucla.edu/stat/stata/webbooks/reg/elemapi
mkdir /Users/lisading/Desktop/regstata
cd /Users/lisading/Desktop/regstata
save elemapi

( Now the data file is saved as c:\regstata\elemapi.dta )

cd /Users/lisading/Desktop/regstata
use elemapi

1. A First Regression Analysis

We perform a regression analysis using the variables api00, acs_k3, meals and full.

api00: the academic performance of the school; acs_k3: the average class size in kindergarten through 3rd grade; meals: the percentage of students receiving free meals, which is an indicator of poverty full: the percentage of teachers who have full teaching credentials.

We expect that better academic performance would be associated with lower class size, fewer students receiving free meals, and a higher percentage of teachers having full teaching credentials.

regress api00 acs_k3 meals full

Stata output:

Output Analysis:

The average class size (acs_k3, b=-2.68), is not statistically significant at the 0.05 level (p=0.055). The coefficient is negative which would indicate that larger class size is related to lower academic performance.

The effect of meals (b=-3.70, p=.000) is significant and its coefficient is negative indicating that the greater the proportion students receiving free meals, the lower the academic performance. Thus, higher levels of poverty are associated with lower academic performance.

The percentage of teachers with full credentials (full, b=0.11, p=.232) seems to be unrelated to academic performance. This would seem to indicate that the percentage of teachers with full credentials is not an important factor in predicting academic performance.

2. Examining data

Command:

describe
list in 1/5
list api00 acs_k3 meals full  in 1/10
codebook api00 acs_k3 meals full yr_rnd

—> There are numerous missing values for meals.

summarize api00 acs_k3 meals full
summarize acs_k3, detail
tabulate acs_k3
list snum dnum acs_k3 if acs_k3 < 0
list dnum snum api00 acs_k3 meals full if dnum == 140
histogram acs_k
graph box acs_k3
stem acs_k3

—> There were negatives accidentally inserted before some of the class sizes (acs_k3).

stem full
tabulate full
tabulate dnum if full <= 1
count if dnum==401
graph matrix api00 acs_k3 meals full, half

—> Over a quarter of the values for full were proportions instead of percentages.

So far, we have identified three problems in our data.

The corrected version of the data is called elemapi2.

use http://www.ats.ucla.edu/stat/stata/webbooks/reg/elemapi2
regress api00 acs_k3 meals full

Stata Output:

save elemapi2

3. Simple Linear Regression

See Annotated Stata Output

4.Reference:

comments powered by Disqus