**log using** Opens and names a log file to record your Stata session

**log close** Closes the log file of your Stata session

**cd** Changes the Stata working directory

**cf** Compares two datasets

**dir** Lists the contents of the current Stata working directory

**infile** Loads an unformatted text-format data file into memory

**insheet** Loads certain text-format data files into memory

**describe** Displays a summary of the current dataset in memory

**sort** Orders or re-orders the observations in the current dataset

**summarize** Displays descriptive summary statistics for numeric variables

**list** Displays values of some or all variables in the current dataset

**save** Saves current dataset in memory as a Stata-format dataset

**use** Loads a Stata-format data file into memory

**label variable** Assigns a label to a variable in a Stata-format dataset

**label data** Assigns a label to a Stata-format dataset

**exit** Ends a Stata session

**generate** Creates new numeric variables from expressions containing existing numeric variables, operators, and functions.

**replace** Used to modify the contents (values) of existing variables.

**compare** Compares two variables; reports the differences and similarities between two variables.

**drop** Eliminates or deletes variables or observations from the dataset in memory.

**label values** Assigns a single value label to a variable.

**label define** Assigns a value label to each distinct value of a variable.

**graph bar** Draws bar charts of sample means of numeric variables.

**correlate** Displays correlation matrix for two or more numeric variables.

**tabulate** Produces one-way and two-way frequency tables for categorical variables.

**table** Produces one-way tables of statistics for the various categories (values) of categorical variables.

**codebook** Displays properties of variables in the current data set.

**regress** Performs OLS estimation of linear regression models.

**_b[varname]** Contains the coefficient estimate for the regressor varname.

**_se[varname]** Contains the standard error of the coefficient estimate for the regressor varname.

**vce** Displays estimated covariance matrix of coefficient estimates.

**matrix get** Accesses coefficient estimates and the covariance matrix.

**display** Computes and displays the values of algebraic expressions.

**scalar** Defines the contents of scalar variables.

**scalar list** Lists the names and values of currently-defined scalar variables.

**scalar drop** Eliminates previously-defined scalars from memory.

**matrix** Defines matrices and performs matrix computations.

**matrix list** Lists contents of a vector or matrix.

**predict** Computes estimated Yi-values and OLS residuals.

**graph twoway** Draws scatterplots of sample data points and line graphs of OLS sample regression functions.

**test** Used after OLS estimation to compute two-tail F-tests of coefficient equality restrictions.

**return list** Lists all temporarily-saved results of the test command.

**lincom** Used after OLS estimation to compute two-tail t-tests of individual regression coefficients.

**Recording** a Stata Session

```
log using XX.log
log usingXX.log, replace
log off
log on
log close
```

Summarizing the contents of the current dataset – **describe**

```
describe
```

Displaying the values of variables — **list**

```
list
list in 1/20
```

Calculating descriptive summary statistics – **summarize**

```
summarize
summarize price mpg weight if mpg > 20
```

Drawing Bar Charts – **graph bar**

```
graph bar (mean) varname, over(category)
```

Creating new variables from existing variables – **generate**

```
generate weightsq = weight^2
gen lnprice = ln(price)
list price price1 in 1/20
summarize price price1
compare price price1
drop price1 mpg1 weight1
```

Computing sample correlations and covariances – **correlate**

```
correlate price mpg weight foreign
```

OLS estimation of linear regression models – **regress** , Y = price = β + β weight + u

```
regress price weight
```

Generating and Graphing Predicted Values and Residuals After **OLS Estimation**

```
log using 351tutorial4.log
describe
summarize
codebook price weight mpg foreign
regress price weight
```

Calculating predicted values and residuals – **predict**

```
predict yhat
predict uhat, residuals
summarize price yhat uhat
list price yhat uhat
correlate price yhat, means
correlate weight uhat, means
correlate yhat uhat, means
```

Basic commands:

```
use http://www.ats.ucla.edu/stat/stata/webbooks/reg/elemapi
mkdir /Users/lisading/Desktop/regstata
cd /Users/lisading/Desktop/regstata
save elemapi
```

( Now the data file is saved as c:\regstata\elemapi.dta )

```
cd /Users/lisading/Desktop/regstata
use elemapi
```

We perform a regression analysis using the variables api00, acs_k3, meals and full.

**api00:** the academic performance of the school;
**acs_k3:** the average class size in kindergarten through 3rd grade;
**meals:** the percentage of students receiving free meals, which is an indicator of poverty
**full:** the percentage of teachers who have full teaching credentials.

We expect that better academic performance would be associated with lower class size, fewer students receiving free meals, and a higher percentage of teachers having full teaching credentials.

```
regress api00 acs_k3 meals full
```

**Stata output:**

**Output Analysis:**

The average class size (acs_k3, b=-2.68), is not statistically significant at the 0.05 level (p=0.055). The coefficient is negative which would indicate that larger class size is related to lower academic performance.

The effect of meals (b=-3.70, p=.000) is significant and its coefficient is negative indicating that the greater the proportion students receiving free meals, the lower the academic performance. Thus, higher levels of poverty are associated with lower academic performance.

The percentage of teachers with full credentials (full, b=0.11, p=.232) seems to be unrelated to academic performance. This would seem to indicate that the percentage of teachers with full credentials is not an important factor in predicting academic performance.

Command:

```
describe
list in 1/5
list api00 acs_k3 meals full in 1/10
codebook api00 acs_k3 meals full yr_rnd
```

—> There are numerous missing values for meals.

```
summarize api00 acs_k3 meals full
summarize acs_k3, detail
tabulate acs_k3
list snum dnum acs_k3 if acs_k3 < 0
list dnum snum api00 acs_k3 meals full if dnum == 140
histogram acs_k
graph box acs_k3
stem acs_k3
```

—> There were negatives accidentally inserted before some of the class sizes (acs_k3).

```
stem full
tabulate full
tabulate dnum if full <= 1
count if dnum==401
graph matrix api00 acs_k3 meals full, half
```

—> Over a quarter of the values for full were proportions instead of percentages.

So far, we have identified three problems in our data.

The corrected version of the data is called elemapi2.

```
use http://www.ats.ucla.edu/stat/stata/webbooks/reg/elemapi2
regress api00 acs_k3 meals full
```

Stata Output:

```
save elemapi2
```