Quellcode-Bibliothek RegressionDialog.cxx
Sprache: C
/* -*- Mode: C++; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * This file is part of the LibreOffice project. * * This Source Code Form is subject to the terms of the Mozilla Public * License, v. 2.0. If a copy of the MPL was not distributed with this * file, You can obtain one at http://mozilla.org/MPL/2.0/. *
*/
1. Linear regression fits using data, a linear function between the dependent variable and the independent variable(s). The basic form of this function is :-
y = b + m_1*x_1 + m_2*x_2 + ... + m_k*x_k
where y is the dependent variable x_1, x_2, ..., x_k are the k independent variables b is the intercept m_1, m_2, ..., m_k are the slopes corresponding to the variables x_1, x_2, ..., x_k respectively.
This equation for n observations can be compactly written using matrices as :-
y = X*A
where y is the n dimensional column vector containing dependent variable observations. where X is matrix of shape n*(k+1) where a row looks like [ 1 x_1 x_2 ... x_k ] A is the k+1 dimensional column vector [ b m_1 m_2 ... m_k ]
Calc formula LINEST(Y_array ; X_array) can be used to compute all entries in "A" along with many other statistics.
2. Logarithmic regression is basically used to find a linear function between the dependent variable and the natural logarithm of the independent variable(s). So the basic form of this functions is :-
y = b + m_1*ln(x_1) + m_2*ln(x_2) + ... + m_k*ln(x_k)
This can be again written in a compact matrix form for n observations.
y = ln(X)*A
where y is the n dimensional column vector containing dependent variable observations. where X is matrix of shape n*(k+1) where a row looks like [ e x_1 x_2 ... x_k ] A is the k+1 dimensional column vector [ b m_1 m_2 ... m_k ]
To estimate A, we use the formula =LINEST(Y_array ; LN(X_array))
3. Power regression is used to fit the following model :-
y = b * (x_1 ^ m_1) * (x_2 ^ m_2) * ... * (x_k ^ m_k)
To reduce this to a linear function(so that we can still use LINEST()), we take natural logarithm on both sides
ln(y) = c + m_1*ln(x_1) + m_2*ln(x_2) + ... + m_k*ln(x_k) ; where c = ln(b)
This again can be written compactly in matrix form as :-
ln(y) = ln(X)*A
where y is the n dimensional column vector containing dependent variable observations. where X is matrix of shape n*(k+1) where a row looks like [ e x_1 x_2 ... x_k ] A is the k+1 dimensional column vector [ c m_1 m_2 ... m_k ]
To estimate A, we use the formula =LINEST(LN(Y_array) ; LN(X_array))
Once we get A, to get back y from x's we use the formula :-
y = exp( ln(X)*A )
Some references for computing confidence interval for the regression coefficients :-
// max col of our output should account for // 1. constant term column, // 2. mnNumIndependentVars columns // 3. Actual Y column // 4. Predicted Y column // 5. Residual Column
SCCOL nOutputMaxCol = mOutputAddress.Col() + mnNumIndependentVars + 3;
if (!bYHasSingleDim)
{ if (bGroupedByColumn)
mxErrorMessage->set_label(ScResId(STR_MESSAGE_YVARIABLE_MULTI_COLUMN)); else
mxErrorMessage->set_label(ScResId(STR_MESSAGE_YVARIABLE_MULTI_ROW)); returnfalse;
}
rTemplate.setTemplate(constTemplateLINEST[nRegressionIndex].
replaceFirst("%CALC_INTERCEPT%",
mbCalcIntercept ? std::u16string_view(u"TRUE") : std::u16string_view(u"FALSE")));
rOutput.writeMatrixFormula(rTemplate.getTemplate(), 1 + mnNumIndependentVars, 5); // Add LINEST result components to template // 1. Add ranges for coefficients and standard errors for independent vars and the intercept. // Note that these two are in the reverse order(m_n, m_n-1, ..., m_1, b) w.r.t what we expect.
rTemplate.autoReplaceRange(u"%COEFFICIENTS_REV_RANGE%"_ustr, ScRange(rOutput.current(), rOutput.current(mnNumIndependentVars)));
rTemplate.autoReplaceRange(u"%SERRORSX_REV_RANGE%"_ustr, ScRange(rOutput.current(0, 1), rOutput.current(mnNumIndependentVars, 1)));
// 2. Add R-squared and standard error for y estimate.
rTemplate.autoReplaceAddress(u"%RSQUARED_ADDR%"_ustr, rOutput.current(0, 2));
rTemplate.autoReplaceAddress(u"%SERRORY_ADDR%"_ustr, rOutput.current(1, 2));
// 3. Add F statistic and degrees of freedom
rTemplate.autoReplaceAddress(u"%FSTATISTIC_ADDR%"_ustr, rOutput.current(0, 3));
rTemplate.autoReplaceAddress(u"%DoFRESID_ADDR%"_ustr, rOutput.current(1, 3));
// 4. Add regression sum of squares and residual sum of squares
rTemplate.autoReplaceAddress(u"%SSREG_ADDR%"_ustr, rOutput.current(0, 4));
rTemplate.autoReplaceAddress(u"%SSRESID_ADDR%"_ustr, rOutput.current(1, 4));
// Re-write all observations in group-by column mode with predictions and residuals void ScRegressionDialog::WritePredictionsWithResiduals(AddressWalkerWriter& rOutput, FormulaTemplate& rTemplate,
size_t nRegressionIndex)
{ bool bGroupedByColumn = mGroupedBy == BY_COLUMN;
rOutput.newLine();
rOutput.push();
// Range of X variables with rows as observations and columns as variables.
ScRange aDataMatrixRange(rOutput.current(0, 1), rOutput.current(mnNumIndependentVars - 1, mnNumObservations));
rTemplate.autoReplaceRange(u"%XDATAMATRIX_RANGE%"_ustr, aDataMatrixRange);
// Write X variable names for (size_t nXvarIdx = 1; nXvarIdx <= mnNumIndependentVars; ++nXvarIdx)
{ // Here we write the X variables without any transformation(LN)
rOutput.writeFormula(GetXVariableNameFormula(nXvarIdx, false));
rOutput.nextColumn();
}
rOutput.reset();
// Write the X data matrix
rOutput.nextRow();
OUString aDataMatrixFormula = bGroupedByColumn ? u"=%VARIABLE1_RANGE%"_ustr : u"=TRANSPOSE(%VARIABLE1_RANGE%)"_ustr;
rTemplate.setTemplate(aDataMatrixFormula);
rOutput.writeMatrixFormula(rTemplate.getTemplate(), mnNumIndependentVars, mnNumObservations);
¤ Die Informationen auf dieser Webseite wurden
nach bestem Wissen sorgfältig zusammengestellt. Es wird jedoch weder Vollständigkeit, noch Richtigkeit,
noch Qualität der bereit gestellten Informationen zugesichert.0.18Bemerkung:
(vorverarbeitet)
¤
Die Informationen auf dieser Webseite wurden
nach bestem Wissen sorgfältig zusammengestellt. Es wird jedoch weder Vollständigkeit, noch Richtigkeit,
noch Qualität der bereit gestellten Informationen zugesichert.
Bemerkung:
Die farbliche Syntaxdarstellung und die Messung sind noch experimentell.