log using Opens and names a log file to record your Stata session
log close Closes the log file of your Stata session
cd Changes the Stata working directory
cf Compares two datasets
dir Lists the contents of the current Stata working directory
infile Loads an unformatted text-format data file into memory
insheet Loads certain text-format data files into memory
describe Displays a summary of the current dataset in memory
sort Orders or re-orders the observations in the current dataset
summarize Displays descriptive summary statistics for numeric variables
list Displays values of some or all variables in the current dataset
save Saves current dataset in memory as a Stata-format dataset
use Loads a Stata-format data file into memory
label variable Assigns a label to a variable in a Stata-format dataset
label data Assigns a label to a Stata-format dataset
exit Ends a Stata session
generate Creates new numeric variables from expressions containing existing numeric variables, operators, and functions.
replace Used to modify the contents (values) of existing variables.
compare Compares two variables; reports the differences and similarities between two variables.
drop Eliminates or deletes variables or observations from the dataset in memory.
label values Assigns a single value label to a variable.
label define Assigns a value label to each distinct value of a variable.
graph bar Draws bar charts of sample means of numeric variables.
correlate Displays correlation matrix for two or more numeric variables.
tabulate Produces one-way and two-way frequency tables for categorical variables.
table Produces one-way tables of statistics for the various categories (values) of categorical variables.
codebook Displays properties of variables in the current data set.
regress Performs OLS estimation of linear regression models.
_b[varname] Contains the coefficient estimate for the regressor varname.
_se[varname] Contains the standard error of the coefficient estimate for the regressor varname.
vce Displays estimated covariance matrix of coefficient estimates.
matrix get Accesses coefficient estimates and the covariance matrix.
display Computes and displays the values of algebraic expressions.
scalar Defines the contents of scalar variables.
scalar list Lists the names and values of currently-defined scalar variables.
scalar drop Eliminates previously-defined scalars from memory.
matrix Defines matrices and performs matrix computations.
matrix list Lists contents of a vector or matrix.
predict Computes estimated Yi-values and OLS residuals.
graph twoway Draws scatterplots of sample data points and line graphs of OLS sample regression functions.
test Used after OLS estimation to compute two-tail F-tests of coefficient equality restrictions.
return list Lists all temporarily-saved results of the test command.
lincom Used after OLS estimation to compute two-tail t-tests of individual regression coefficients.
Recording a Stata Session
log using XX.log log usingXX.log, replace log off log on log close
Summarizing the contents of the current dataset – describe
Displaying the values of variables — list
list list in 1/20
Calculating descriptive summary statistics – summarize
summarize summarize price mpg weight if mpg > 20
Drawing Bar Charts – graph bar
graph bar (mean) varname, over(category)
Creating new variables from existing variables – generate
generate weightsq = weight^2 gen lnprice = ln(price) list price price1 in 1/20 summarize price price1 compare price price1 drop price1 mpg1 weight1
Computing sample correlations and covariances – correlate
correlate price mpg weight foreign
OLS estimation of linear regression models – regress , Y = price = β + β weight + u
regress price weight
Generating and Graphing Predicted Values and Residuals After OLS Estimation
log using 351tutorial4.log describe summarize codebook price weight mpg foreign regress price weight
Calculating predicted values and residuals – predict
predict yhat predict uhat, residuals summarize price yhat uhat list price yhat uhat correlate price yhat, means correlate weight uhat, means correlate yhat uhat, means
use http://www.ats.ucla.edu/stat/stata/webbooks/reg/elemapi mkdir /Users/lisading/Desktop/regstata cd /Users/lisading/Desktop/regstata save elemapi
( Now the data file is saved as c:\regstata\elemapi.dta )
cd /Users/lisading/Desktop/regstata use elemapi
We perform a regression analysis using the variables api00, acs_k3, meals and full.
api00: the academic performance of the school; acs_k3: the average class size in kindergarten through 3rd grade; meals: the percentage of students receiving free meals, which is an indicator of poverty full: the percentage of teachers who have full teaching credentials.
We expect that better academic performance would be associated with lower class size, fewer students receiving free meals, and a higher percentage of teachers having full teaching credentials.
regress api00 acs_k3 meals full
The average class size (acs_k3, b=-2.68), is not statistically significant at the 0.05 level (p=0.055). The coefficient is negative which would indicate that larger class size is related to lower academic performance.
The effect of meals (b=-3.70, p=.000) is significant and its coefficient is negative indicating that the greater the proportion students receiving free meals, the lower the academic performance. Thus, higher levels of poverty are associated with lower academic performance.
The percentage of teachers with full credentials (full, b=0.11, p=.232) seems to be unrelated to academic performance. This would seem to indicate that the percentage of teachers with full credentials is not an important factor in predicting academic performance.
describe list in 1/5 list api00 acs_k3 meals full in 1/10 codebook api00 acs_k3 meals full yr_rnd
—> There are numerous missing values for meals.
summarize api00 acs_k3 meals full summarize acs_k3, detail tabulate acs_k3 list snum dnum acs_k3 if acs_k3 < 0 list dnum snum api00 acs_k3 meals full if dnum == 140 histogram acs_k graph box acs_k3 stem acs_k3
—> There were negatives accidentally inserted before some of the class sizes (acs_k3).
stem full tabulate full tabulate dnum if full <= 1 count if dnum==401 graph matrix api00 acs_k3 meals full, half
—> Over a quarter of the values for full were proportions instead of percentages.
So far, we have identified three problems in our data.
The corrected version of the data is called elemapi2.
use http://www.ats.ucla.edu/stat/stata/webbooks/reg/elemapi2 regress api00 acs_k3 meals full