Linear Regression Calculator

Calculate the least squares regression line, slope, intercept, R², and correlation coefficient for your data. Make predictions using the fitted model. See also our Correlation Calculator, R-Squared Calculator, and Standard Deviation Calculator.

Data Points (x, y)

Predict Y for X =

Related Calculators:

Correlation Calculator R-Squared Calculator Standard Deviation Exponential Regression Quadratic Regression Scatter Plot Calculator

How to Use the Linear Regression Calculator

Linear regression finds the best-fitting straight line through a set of data points by minimizing the sum of squared residuals (differences between observed and predicted values). This method, known as Ordinary Least Squares (OLS), produces the line that best describes the linear relationship between an independent variable x and a dependent variable y.

Enter your data points as x,y pairs using the input rows. You can add or remove rows as needed. The calculator requires at least two data points but works best with larger datasets. Enter a value in the prediction field to estimate y for a new x value using the fitted regression line. Click Calculate to see the slope, intercept, correlation coefficient, R-squared value, and standard error of the regression.

The slope (b₁) tells you how much y changes for each one-unit increase in x. The intercept (b₀) is the predicted value of y when x equals zero. Together they form the equation ŷ = b₀ + b₁x. The R² value indicates what proportion of the variance in y is explained by the linear relationship with x, ranging from 0 (no explanatory power) to 1 (perfect fit).

The standard error of the regression measures the typical size of the residuals — how far observed values tend to fall from the regression line. A smaller standard error indicates the model's predictions are more precise. This value is also used to construct confidence intervals and prediction intervals around the regression line, quantifying the uncertainty in your estimates.

Formula

Slope (b₁):

b₁ = (nΣxy - ΣxΣy) / (nΣx² - (Σx)²)

Intercept (b₀):

b₀ = ȳ - b₁x̄

Correlation Coefficient (r):

r = (nΣxy - ΣxΣy) / √[(nΣx² - (Σx)²)(nΣy² - (Σy)²)]

Coefficient of Determination:

R² = 1 - SS_res / SS_tot

Standard Error:

SE = √(SS_res / (n - 2))

Prediction Interval:

ŷ ± t_(α/2) × SE × √(1 + 1/n + (x₀ - x̄)²/Σ(xᵢ - x̄)²)

Example Calculation

Data: (1,2), (2,4), (3,5), (4,4), (5,5)

n = 5, Σx = 15, Σy = 20, Σxy = 67, Σx² = 55

b₁ = (5×67 - 15×20) / (5×55 - 15²) = (335 - 300) / (275 - 225) = 35/50 = 0.7

b₀ = 20/5 - 0.7×(15/5) = 4 - 2.1 = 1.9

Equation: y = 1.9 + 0.7x

Prediction at x=6: y = 1.9 + 0.7(6) = 6.1

R² = 0.636 (63.6% of variance explained)

Reference Table

Dataset Type	r	R²	Interpretation
Perfect Positive	1.000	1.000	Exact linear relationship
Strong Positive	0.900	0.810	81% variance explained
Moderate Positive	0.600	0.360	36% variance explained
Weak Positive	0.300	0.090	9% variance explained
No Correlation	0.000	0.000	No linear relationship
Moderate Negative	-0.600	0.360	36% variance explained
Strong Negative	-0.900	0.810	81% variance explained
Perfect Negative	-1.000	1.000	Exact inverse relationship

Frequently Asked Questions

What is linear regression?

Linear regression is a statistical method that models the relationship between a dependent variable y and one or more independent variables x by fitting a straight line to observed data. The ordinary least squares (OLS) method minimizes the sum of squared differences between observed values and the values predicted by the linear model. It is the most widely used technique for predictive modeling and understanding relationships between variables in science, engineering, economics, and social sciences.

What does R-squared tell me?

R-squared (R²) is the coefficient of determination, representing the proportion of variance in the dependent variable that is predictable from the independent variable. An R² of 0.85 means 85% of the variability in y is explained by the linear relationship with x. Values range from 0 to 1, with higher values indicating a better fit. However, a high R² does not prove causation, and adding more predictors always increases R² (use adjusted R² for multiple regression).

What is the difference between correlation and regression?

Correlation measures the strength and direction of the linear relationship between two variables (r ranges from -1 to 1). Regression goes further by fitting a predictive model (y = b₀ + b₁x) that allows you to estimate y for any given x. Correlation is symmetric (r of x,y equals r of y,x), while regression is directional — the regression of y on x differs from x on y. Use correlation to describe association; use regression to predict or explain.

How many data points do I need?

Technically, two points define a line, but meaningful regression requires more. A minimum of 10-20 observations is recommended for simple linear regression to get reliable estimates. The standard error and confidence intervals become more meaningful with larger samples. For multiple regression, a common rule of thumb is at least 10-15 observations per predictor variable. More data generally produces more stable and generalizable results.

What are the assumptions of linear regression?

The key assumptions are: (1) Linearity — the relationship between x and y is linear; (2) Independence — observations are independent of each other; (3) Homoscedasticity — the variance of residuals is constant across all levels of x; (4) Normality — residuals are approximately normally distributed; (5) No multicollinearity (for multiple regression). Violations of these assumptions can lead to biased estimates, incorrect standard errors, and unreliable predictions.

Can I extrapolate beyond my data range?

Extrapolation (predicting y for x values outside the observed range) is risky because the linear relationship may not hold beyond the data. The further you extrapolate, the less reliable the prediction. Interpolation (predicting within the data range) is generally safe. Always check whether the linear assumption is reasonable for the prediction range. If the relationship is non-linear, consider polynomial or exponential regression models instead.