Linear Regression Calculator
Calculate the least squares regression line, slope, intercept, R², and correlation coefficient for your data. Make predictions using the fitted model. See also our Correlation Calculator, R-Squared Calculator, and Standard Deviation Calculator.
How to Use the Linear Regression Calculator
Linear regression finds the best-fitting straight line through a set of data points by minimizing the sum of squared residuals (differences between observed and predicted values). This method, known as Ordinary Least Squares (OLS), produces the line that best describes the linear relationship between an independent variable x and a dependent variable y.
Enter your data points as x,y pairs using the input rows. You can add or remove rows as needed. The calculator requires at least two data points but works best with larger datasets. Enter a value in the prediction field to estimate y for a new x value using the fitted regression line. Click Calculate to see the slope, intercept, correlation coefficient, R-squared value, and standard error of the regression.
The slope (b₁) tells you how much y changes for each one-unit increase in x. The intercept (b₀) is the predicted value of y when x equals zero. Together they form the equation ŷ = b₀ + b₁x. The R² value indicates what proportion of the variance in y is explained by the linear relationship with x, ranging from 0 (no explanatory power) to 1 (perfect fit).
The standard error of the regression measures the typical size of the residuals — how far observed values tend to fall from the regression line. A smaller standard error indicates the model's predictions are more precise. This value is also used to construct confidence intervals and prediction intervals around the regression line, quantifying the uncertainty in your estimates.
Formula
Slope (b₁):
b₁ = (nΣxy - ΣxΣy) / (nΣx² - (Σx)²)
Intercept (b₀):
b₀ = ȳ - b₁x̄
Correlation Coefficient (r):
r = (nΣxy - ΣxΣy) / √[(nΣx² - (Σx)²)(nΣy² - (Σy)²)]
Coefficient of Determination:
R² = 1 - SS_res / SS_tot
Standard Error:
SE = √(SS_res / (n - 2))
Prediction Interval:
ŷ ± t_(α/2) × SE × √(1 + 1/n + (x₀ - x̄)²/Σ(xᵢ - x̄)²)
Example Calculation
Data: (1,2), (2,4), (3,5), (4,4), (5,5)
n = 5, Σx = 15, Σy = 20, Σxy = 67, Σx² = 55
b₁ = (5×67 - 15×20) / (5×55 - 15²) = (335 - 300) / (275 - 225) = 35/50 = 0.7
b₀ = 20/5 - 0.7×(15/5) = 4 - 2.1 = 1.9
Equation: y = 1.9 + 0.7x
Prediction at x=6: y = 1.9 + 0.7(6) = 6.1
R² = 0.636 (63.6% of variance explained)
Reference Table
| Dataset Type | r | R² | Interpretation |
|---|---|---|---|
| Perfect Positive | 1.000 | 1.000 | Exact linear relationship |
| Strong Positive | 0.900 | 0.810 | 81% variance explained |
| Moderate Positive | 0.600 | 0.360 | 36% variance explained |
| Weak Positive | 0.300 | 0.090 | 9% variance explained |
| No Correlation | 0.000 | 0.000 | No linear relationship |
| Moderate Negative | -0.600 | 0.360 | 36% variance explained |
| Strong Negative | -0.900 | 0.810 | 81% variance explained |
| Perfect Negative | -1.000 | 1.000 | Exact inverse relationship |
Frequently Asked Questions
What is linear regression?
Linear regression is a statistical method that models the relationship between a dependent variable y and one or more independent variables x by fitting a straight line to observed data. The ordinary least squares (OLS) method minimizes the sum of squared differences between observed values and the values predicted by the linear model. It is the most widely used technique for predictive modeling and understanding relationships between variables in science, engineering, economics, and social sciences.
What does R-squared tell me?
R-squared (R²) is the coefficient of determination, representing the proportion of variance in the dependent variable that is predictable from the independent variable. An R² of 0.85 means 85% of the variability in y is explained by the linear relationship with x. Values range from 0 to 1, with higher values indicating a better fit. However, a high R² does not prove causation, and adding more predictors always increases R² (use adjusted R² for multiple regression).
What is the difference between correlation and regression?
Correlation measures the strength and direction of the linear relationship between two variables (r ranges from -1 to 1). Regression goes further by fitting a predictive model (y = b₀ + b₁x) that allows you to estimate y for any given x. Correlation is symmetric (r of x,y equals r of y,x), while regression is directional — the regression of y on x differs from x on y. Use correlation to describe association; use regression to predict or explain.
How many data points do I need?
Technically, two points define a line, but meaningful regression requires more. A minimum of 10-20 observations is recommended for simple linear regression to get reliable estimates. The standard error and confidence intervals become more meaningful with larger samples. For multiple regression, a common rule of thumb is at least 10-15 observations per predictor variable. More data generally produces more stable and generalizable results.
What are the assumptions of linear regression?
The key assumptions are: (1) Linearity — the relationship between x and y is linear; (2) Independence — observations are independent of each other; (3) Homoscedasticity — the variance of residuals is constant across all levels of x; (4) Normality — residuals are approximately normally distributed; (5) No multicollinearity (for multiple regression). Violations of these assumptions can lead to biased estimates, incorrect standard errors, and unreliable predictions.
Can I extrapolate beyond my data range?
Extrapolation (predicting y for x values outside the observed range) is risky because the linear relationship may not hold beyond the data. The further you extrapolate, the less reliable the prediction. Interpolation (predicting within the data range) is generally safe. Always check whether the linear assumption is reasonable for the prediction range. If the relationship is non-linear, consider polynomial or exponential regression models instead.