December 14, 2014

Stata 学习笔记

Reference: Mike Abbott’s Stata Tutorials

##(I.) Commands:

log using Opens and names a log file to record your Stata session

log close Closes the log file of your Stata session

cd Changes the Stata working directory

cf Compares two datasets

dir Lists the contents of the current Stata working directory

infile Loads an unformatted text-format data file into memory

insheet Loads certain text-format data files into memory

describe Displays a summary of the current dataset in memory

sort Orders or re-orders the observations in the current dataset

summarize Displays descriptive summary statistics for numeric variables

list Displays values of some or all variables in the current dataset

save Saves current dataset in memory as a Stata-format dataset

use Loads a Stata-format data file into memory

label variable Assigns a label to a variable in a Stata-format dataset

label data Assigns a label to a Stata-format dataset

exit Ends a Stata session

generate Creates new numeric variables from expressions containing existing numeric variables, operators, and functions.

replace Used to modify the contents (values) of existing variables.

compare Compares two variables; reports the differences and similarities between two variables.

drop Eliminates or deletes variables or observations from the dataset in memory.

label values Assigns a single value label to a variable.

label define Assigns a value label to each distinct value of a variable.

graph bar Draws bar charts of sample means of numeric variables.

correlate Displays correlation matrix for two or more numeric variables.

tabulate Produces one-way and two-way frequency tables for categorical variables.

table Produces one-way tables of statistics for the various categories (values) of categorical variables.

codebook Displays properties of variables in the current data set.

regress Performs OLS estimation of linear regression models.

_b[varname] Contains the coefficient estimate for the regressor varname.

_se[varname] Contains the standard error of the coefficient estimate for the regressor varname.

vce Displays estimated covariance matrix of coefficient estimates.

matrix get Accesses coefficient estimates and the covariance matrix.

display Computes and displays the values of algebraic expressions.

scalar Defines the contents of scalar variables.

scalar list Lists the names and values of currently-defined scalar variables.

scalar drop Eliminates previously-defined scalars from memory.

matrix Defines matrices and performs matrix computations.

matrix list Lists contents of a vector or matrix.

predict Computes estimated Yi-values and OLS residuals.

graph twoway Draws scatterplots of sample data points and line graphs of OLS sample regression functions.

test Used after OLS estimation to compute two-tail F-tests of coefficient equality restrictions.

return list Lists all temporarily-saved results of the test command.

lincom Used after OLS estimation to compute two-tail t-tests of individual regression coefficients.

##(II.) Stata Basics

  • Recording a Stata Session

    log using XX.log

    log usingXX.log, replace

    log off

    log on

    log close

  • Summarizing the contents of the current dataset – describe


  • Displaying the values of variables — list


    list in 1/20

  • Calculating descriptive summary statistics – summarize


    summarize price mpg weight if mpg > 20

  • Drawing Bar Charts – graph bar

    graph bar (mean) varname, over(category)

  • Creating new variables from existing variables – generate

    generate weightsq = weight^2

    gen lnprice = ln(price)

    list price price1 in 1/20

    summarize price price1

    compare price price1

    drop price1 mpg1 weight1

  • Computing sample correlations and covariances – correlate

    correlate price mpg weight foreign

  • OLS estimation of linear regression models – regress , Y = price = β + β weight + u

    regress price weight

  • Generating and Graphing Predicted Values and Residuals After OLS Estimation

    log using 351tutorial4.log



    codebook price weight mpg foreign

    regress price weight

  • Calculating predicted values and residuals – predict

    predict yhat

    predict uhat, residuals

    summarize price yhat uhat

    list price yhat uhat

    correlate price yhat, means

    correlate weight uhat, means

    correlate yhat uhat, means

##(III.)A true example:

####1.0 Introduction

Basic commands:


mkdir /Users/lisading/Desktop/regstata

cd /Users/lisading/Desktop/regstata

save elemapi

( Now the data file is saved as c:\regstata\elemapi.dta )

cd /Users/lisading/Desktop/regstata

use elemapi

#####1. A First Regression Analysis

We perform a regression analysis using the variables api00, acs_k3, meals and full.

api00: the academic performance of the school; acs_k3: the average class size in kindergarten through 3rd grade; meals: the percentage of students receiving free meals, which is an indicator of poverty full: the percentage of teachers who have full teaching credentials.

We expect that better academic performance would be associated with lower class size, fewer students receiving free meals, and a higher percentage of teachers having full teaching credentials.

regress api00 acs_k3 meals full

Stata output:

Output Analysis:

The average class size (acs_k3, b=-2.68), is not statistically significant at the 0.05 level (p=0.055). The coefficient is negative which would indicate that larger class size is related to lower academic performance.

The effect of meals (b=-3.70, p=.000) is significant and its coefficient is negative indicating that the greater the proportion students receiving free meals, the lower the academic performance. Thus, higher levels of poverty are associated with lower academic performance.

The percentage of teachers with full credentials (full, b=0.11, p=.232) seems to be unrelated to academic performance. This would seem to indicate that the percentage of teachers with full credentials is not an important factor in predicting academic performance.

#####2. Examining data



list in 1/5

list api00 acs_k3 meals full  in 1/10

codebook api00 acs_k3 meals full yr_rnd

( —> There are numerous missing values for meals. )

summarize api00 acs_k3 meals full

summarize acs_k3, detail

tabulate acs_k3

list snum dnum acs_k3 if acs_k3 < 0

list dnum snum api00 acs_k3 meals full if dnum == 140

histogram acs_k3

graph box acs_k3

stem acs_k3

( —> There were negatives accidentally inserted before some of the class sizes (acs_k3).)

stem full

tabulate full

tabulate dnum if full <= 1

count if dnum==401

graph matrix api00 acs_k3 meals full, half

( —> Over a quarter of the values for full were proportions instead of percentages. )

So far, we have identified three problems in our data.

The corrected version of the data is called elemapi2.


regress api00 acs_k3 meals full

Stata Output:

save elemapi2

#####3. Simple Linear Regression

Annotated Stata Output

#####4.Reference: Mike Abbott’s Stata Tutorials

comments powered by Disqus