Lineare und logistische Regression in Python

Vorbereitung

Wir laden die Schwertlinien Datentabelle (engl. iris flower data set) von Anderson bzw. Fisher. ¹

Die Datentabelle enthält 150 Messungen von fünf Attributen:

Länge des Kelchblatts (sepal length in cm),
Breite des Kelchblatts (sepal width in cm),
Länge des Kronblatts (petal length in cm),
Breite des Kronblatts (petal width in cm) sowie
Spezies (Iris setosa = 0, Iris versicolor = 1 und Iris virginica = 2)

Wir laden sie aus dem Paket sklearn.datasets mit dem Befehl load_iris():

from sklearn.datasets import load_iris
import pandas as pd

iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = iris.target

X.head()

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

(Einfache) Lineare Regression

Wir beginnen mit einer einfachen linearen Regression. Dabei erstellen wir ein Model, bei dem wir die Länge des Kronblatts (petal length) auf Grundlage der Länge des Kelchblatts (sepal length) modellieren wollen.

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Zielvariable y_reg: Petal Length
y_reg = X["petal length (cm)"]
# Modellierungsvariable X_reg: Sepal Legth
X_reg = X.drop(columns=["petal length (cm)", "sepal width (cm)", "petal width (cm)"])

Wir teilen nun unsere Daten in Test- und Trainingsdaten auf. Dabei sollen \(0.2=20\,\%\) der Daten zu Testdaten werden und \(1-0.2=0.8=80\,\%\) zu Trainingsdaten.

Die Auswahl erfolgt zufällig. Wobei wir hier einen mit random_state=2009 dafür sorgen, dass dieser Zufall reproduzierbar ist.

# Aufteilen in Test- und Trainingsdaten
X_train, X_test, y_train, y_test = train_test_split(X_reg, y_reg, test_size=0.2, random_state=2009)

Nun erstellen wir das lineare Regressionmodell und trainieren dieses mit den Daten X_train und y_train

# Modell
linreg = LinearRegression()
linreg.fit(X_train, y_train)

LinearRegression()

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Um das Modell zu prüfen lassen wir es nun aus unseren Testdaten (X_test) die Zieldaten (y_pred) modellieren:

# Vorhersage
y_pred = linreg.predict(X_test)

Da wir mit y_test die tatsächlichen Daten kennen, können wir das nun mit unserem vom Modell modellierten Werten vergleichen und so unsere Modell bewerten.

Wir nutzen dazu das die mittlere quadratische Abweichung (engl. mean squared error, kurz MSE) und das Bestimmtheitsmaß \(R^2\).

print("MSE:", mean_squared_error(y_test, y_pred))
print("R^2:", linreg.score(X_reg, y_reg))

MSE: 0.7036607653595639
R^2: 0.7593651994880084

Wir können die Regressiongerade mit Hilfe der Koeffizienten und der Ordinate (engl.: Intercept) bestimmen:

Koeffizienten: [1.87753167]
Intercept: -7.252744633173977

Daraus ergibt sich für die Regressiongerade die Darstellung:

\[ \hat{y} = \hat{f}(x) \approx 1.8775 \cdot x - 7.2527 \]

Schauen wir uns das Streudiagramm unserer Test-und Trainingsdaten mit der Regressiongeraden an:

import matplotlib.pyplot as plt
import numpy as np

plt.figure(figsize=(8, 5))

# Streudiagramm der Trainingsdaten
plt.scatter(X_train, y_train, color="blue", label="Trainingswerte")
plt.scatter(X_test, y_test, color="green", label="Testwerte")

# Regressionsgerade
xp = np.linspace(4,8,200)
yp = a * xp + b
pm = "+" if b >= 0 else "-"
plt.plot(xp, yp, label=r'$\hat{y}$'+f'= {a}x {pm} {absb}', color='red')


plt.xlabel("Länge des Kelchblatts (cm)")
plt.ylabel("Länge des Kronblatts (cm)")
plt.title("Lineare Regression auf den Liliendaten")
plt.legend()
plt.grid(True)
plt.show()

(Multiple) Lineare Regression

Wir wollen nun bei der Modellierung auf alle drei anderen numerischen Variabeln zurpückgreifen. Das führt zu einer multiplen linearen Regression.

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Zielvariable: Petal Length
y_reg = X["petal length (cm)"]

# Modellierungsvariablen: Sepal Width, Petal Length und Petal Width
X_reg = X.drop(columns=["petal length (cm)"])

# Aufteilen in Test- und Trainingsdaten
X_train, X_test, y_train, y_test = train_test_split(X_reg, y_reg, test_size=0.20, random_state=2009)

# Modell
linreg = LinearRegression()
linreg.fit(X_train, y_train)

# Vorhersage
y_pred = linreg.predict(X_test)

print("MSE:", mean_squared_error(y_test, y_pred))
print("R^2:", linreg.score(X_reg, y_reg))

MSE: 0.11749487412693703
R^2: 0.9675532979408549

print("Koeffizienten:", linreg.coef_)
print("Intercept:", linreg.intercept_)

Koeffizienten: [ 0.70649516 -0.66257783  1.49483392]
Intercept: -0.11204676270834524

Wir erhalten somit die folgende Regressionsfunktion:

\[ \hat{y} = \hat{f}(x_1, x_2, x_3) = 0.7065 \cdot x_1 - 0.6626 \cdot x_2 + 1.4948 \cdot x_3 - 0.112 \]

Einfache Logistische Regression

Unsere Liliendaten bestehen aus den Längen und Breiten der Kelch- bzw. Korbblüten von drei Lilienarten. In der Variable y sind diese Arten wie folgt kodiert:

0 = setosa

1 = versicolor

2 = virginica

Wir schauen uns zunächst nur die Iris setosa an, dafür setzen wir ‘y’ immer dann auf 1, wenn eine solche Lilienart vorliegt, sonst auf 0:

y_setosa = np.where( y == 0, 1, 0)
y_setosa
X_setosa = X.drop(columns=["petal length (cm)", "sepal width (cm)", "petal width (cm)"])
X_setosa

	sepal length (cm)
0	5.1
1	4.9
2	4.7
3	4.6
4	5.0
...	...
145	6.7
146	6.3
147	6.5
148	6.2
149	5.9

150 rows × 1 columns

Betrachten wir nun unsere Situation als Streudiagramm:

plt.figure(figsize=(8, 5))

plt.subplot(1,2,1)
setosa = X_setosa[y_setosa == 1]
notsetosa = X_setosa[y_setosa == 0]
plt.ylim(3,8)
setosa.boxplot()
plt.title("Iris setosa")
plt.subplot(1, 2, 2)
notsetosa.boxplot()
plt.ylim(3, 8)
plt.title("Nicht Iris setosa")


#grp =[X_setosa[y_setosa == g] for g in (0, 1)]
#plt.boxplot(grp, tick_labels=["I. setosa", "Nicht I. setosa"])
plt.show()

Wenn wir eine lineare Regression nur der ??? durchführen würden, so sähe das wie folgt aus:

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Aufteilen in Test- und Trainingsdaten
X_train, X_test, y_train, y_test = train_test_split(X_setosa, y_setosa, test_size=0.2, random_state=2009)
#X_train
#y_train

linmod_linreg = LinearRegression()
linmod_linreg.fit(X_train, y_train)

LinearRegression()

Koeffizienten: [-0.40773779]
Intercept: 2.7296683561998614

Das Streudiagramm

import matplotlib.pyplot as plt
import numpy as np

plt.figure(figsize=(8, 5))

# Streudiagramm der Trainingsdaten
plt.scatter(X_train, y_train, color="blue", label="Trainingswerte")
# plt.scatter(X_test, y_test, color="green", label="Testwerte")

# Regressionsgerade
xp = np.linspace(4,8,200)
yp = a * xp + b
pm = "+" if b >= 0 else "-"
plt.plot(xp, yp, label=r'$\hat{y}$'+f'= {a}x {pm} {absb}', color='red')


plt.xlabel("Länge des Kelchblatts (cm)")
plt.ylabel("Länge des Kronblatts (cm)")
plt.title("Lineare Regression auf den Liliendaten")
plt.legend()
plt.grid(True)
plt.show()

Statt des linearen Ansatzes, wollen wir eine Wahrscheinlichkeit modellieren ob eine entsprechende Länge des Kelchsblattes für oder gegen eine Iris setosa spricht.

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Modell
logreg = LogisticRegression(max_iter=200)
logreg.fit(X_train, y_train)

LogisticRegression(max_iter=200)

a = round(logreg.coef_.item(), 4)
b = round(logreg.intercept_.item(), 4)

import matplotlib.pyplot as plt
import numpy as np

plt.figure(figsize=(8, 5))

# Streudiagramm der Trainingsdaten
plt.scatter(X_train, y_train, color="blue", label="Trainingswerte")
plt.scatter(X_test, y_test, color="green", label="Testwerte")

# Regressionsfunktion ermitteln
import pandas as pd

xp = pd.DataFrame(
    np.linspace(4, 8, 200),
    columns=["sepal length (cm)"]
)
yp = logreg.predict_proba(xp)[:,1] 
pm = "+" if b >= 0 else "-"
plt.plot(xp, yp, label=r'$P(y = 1) = $'+f'expit({a}x {pm} {absb})', color='red')


plt.xlabel("Länge des Kelchblatts (cm)")
plt.ylabel("Wahrscheinlichkeit für Iris setosa")
plt.title("Logistische Regression auf den Liliendaten")
plt.legend()
plt.grid(True)
plt.show()

Dabei bezeichnet \(expit(x)\) die logistische Funktion (auch sigmonide Funktion genannt) und ist die Umkehrfunktion der \(logit\)-Funktion.

Wir können an der Grafik schon erkennen, das unser Ansatz so nicht unbedingt optimal ist. Schauen wir uns aber dennoch die Gütemaße an:

y_test
y_pred = logreg.predict(X_test)
y_pred

array([1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1,
       1, 1, 0, 0, 0, 0, 0, 1])

from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report

print("Accuracy:", accuracy_score(y_test, y_pred))

print("Precision:", precision_score(y_test, y_pred))

print("Recall:", recall_score(y_test, y_pred))

print("F1-Score:", f1_score(y_test, y_pred))

Accuracy: 0.9
Precision: 0.75
Recall: 0.8571428571428571
F1-Score: 0.8

Etwas detailierter können wir uns das mit dem folgenden Befehl komplett ansehen:

print(classification_report(y_test, y_pred, target_names=["positiv", "negativ"], digits=3))

              precision    recall  f1-score   support

     positiv      0.955     0.913     0.933        23
     negativ      0.750     0.857     0.800         7

    accuracy                          0.900        30
   macro avg      0.852     0.885     0.867        30
weighted avg      0.907     0.900     0.902        30

Wagen wir kurz einen Blick auf alle Daten (Test- wie Trainingsdaten):

print(classification_report(y_setosa, logreg.predict(X_setosa), target_names=["negativ", "positiv"], digits=3))

              precision    recall  f1-score   support

     negativ      0.904     0.940     0.922       100
     positiv      0.870     0.800     0.833        50

    accuracy                          0.893       150
   macro avg      0.887     0.870     0.877       150
weighted avg      0.892     0.893     0.892       150

Multinomiale Logistische Regression

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report

Wir teilen unsere Daten wieder in Test- und Trainingsdaten auf:

# Klassifikationsdaten
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=2009)

Erstelen wir nun ein logistisches Regressionsmodell und trainieren dieses.

# Modell
logreg = LogisticRegression(max_iter=200)
logreg.fit(X_train, y_train)

LogisticRegression(max_iter=200)

Erzeugen wir nun aus unserem Modell eine Vorhersage für die Testdaten:

# Vorhersage
y_pred = logreg.predict(X_test)

Schauen wir uns für die Testdaten einmal die echten und vorhergesagten Werte an:

# Vergleich der Testdaten
y_test
y_pred

array([0, 0, 2, 1, 1, 2, 2, 1, 2, 2, 2, 0, 2, 2, 2, 1, 2, 1, 0, 1, 1, 0,
       0, 1, 1, 2, 1, 2, 1, 0])

Berechnen wir die üblichen Gütemaße:

print("Accuracy:", accuracy_score(y_test, y_pred))

print("Precision:", precision_score(y_test, y_pred, average='weighted'))

print("Recall:", recall_score(y_test, y_pred, average='weighted'))

print("F1-Score:", f1_score(y_test, y_pred, average='weighted'))

Accuracy: 0.9333333333333333
Precision: 0.9454545454545454
Recall: 0.9333333333333333
F1-Score: 0.9341025641025641

Etwas detailierter können wir uns das mit dem folgenden Befehl komplett ansehen:

print(classification_report(y_test, y_pred, target_names=iris.target_names, digits=3))

              precision    recall  f1-score   support

      setosa      1.000     1.000     1.000         7
  versicolor      0.818     1.000     0.900         9
   virginica      1.000     0.857     0.923        14

    accuracy                          0.933        30
   macro avg      0.939     0.952     0.941        30
weighted avg      0.945     0.933     0.934        30

Wagen wir kurz einen Blick auf alle Daten (Test- wie Trainingsdaten):

print(classification_report(y, logreg.predict(X), target_names=iris.target_names, digits=3))

              precision    recall  f1-score   support

      setosa      1.000     1.000     1.000        50
  versicolor      0.923     0.960     0.941        50
   virginica      0.958     0.920     0.939        50

    accuracy                          0.960       150
   macro avg      0.960     0.960     0.960       150
weighted avg      0.960     0.960     0.960       150

Fußnoten

vgl.: https://de.wikipedia.org/wiki/Schwertlilien-Datensatz ↩︎

	fit_intercept fit_intercept: bool, default=True Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).	True
	copy_X copy_X: bool, default=True If True, X will be copied; else, it may be overwritten.	True
	tol tol: float, default=1e-6 The precision of the solution (`coef_`) is determined by `tol` which specifies a different convergence criterion for the `lsqr` solver. `tol` is set as `atol` and `btol` of :func:`scipy.sparse.linalg.lsqr` when fitting on sparse training data. This parameter has no effect when fitting on dense data. .. versionadded:: 1.7	1e-06
	n_jobs n_jobs: int, default=None The number of jobs to use for the computation. This will only provide speedup in case of sufficiently large problems, that is if firstly `n_targets > 1` and secondly `X` is sparse or if `positive` is set to `True`. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary ` for more details.	None
	positive positive: bool, default=False When set to ``True``, forces the coefficients to be positive. This option is only supported for dense arrays. For a comparison between a linear regression model with positive constraints on the regression coefficients and a linear regression without such constraints, see :ref:`sphx_glr_auto_examples_linear_model_plot_nnls.py`. .. versionadded:: 0.24	False

	fit_intercept fit_intercept: bool, default=True Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).	True
	copy_X copy_X: bool, default=True If True, X will be copied; else, it may be overwritten.	True
	tol tol: float, default=1e-6 The precision of the solution (`coef_`) is determined by `tol` which specifies a different convergence criterion for the `lsqr` solver. `tol` is set as `atol` and `btol` of :func:`scipy.sparse.linalg.lsqr` when fitting on sparse training data. This parameter has no effect when fitting on dense data. .. versionadded:: 1.7	1e-06
	n_jobs n_jobs: int, default=None The number of jobs to use for the computation. This will only provide speedup in case of sufficiently large problems, that is if firstly `n_targets > 1` and secondly `X` is sparse or if `positive` is set to `True`. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary ` for more details.	None
	positive positive: bool, default=False When set to ``True``, forces the coefficients to be positive. This option is only supported for dense arrays. For a comparison between a linear regression model with positive constraints on the regression coefficients and a linear regression without such constraints, see :ref:`sphx_glr_auto_examples_linear_model_plot_nnls.py`. .. versionadded:: 0.24	False

	penalty penalty: {'l1', 'l2', 'elasticnet', None}, default='l2' Specify the norm of the penalty: - `None`: no penalty is added; - `'l2'`: add a L2 penalty term and it is the default choice; - `'l1'`: add a L1 penalty term; - `'elasticnet'`: both L1 and L2 penalty terms are added. .. warning:: Some penalties may not work with some solvers. See the parameter `solver` below, to know the compatibility between the penalty and solver. .. versionadded:: 0.19 l1 penalty with SAGA solver (allowing 'multinomial' + L1) .. deprecated:: 1.8 `penalty` was deprecated in version 1.8 and will be removed in 1.10. Use `l1_ratio` instead. `l1_ratio=0` for `penalty='l2'`, `l1_ratio=1` for `penalty='l1'` and `l1_ratio` set to any float between 0 and 1 for `'penalty='elasticnet'`.	'deprecated'
	C C: float, default=1.0 Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization. `C=np.inf` results in unpenalized logistic regression. For a visual example on the effect of tuning the `C` parameter with an L1 penalty, see: :ref:`sphx_glr_auto_examples_linear_model_plot_logistic_path.py`.	1.0
	l1_ratio l1_ratio: float, default=0.0 The Elastic-Net mixing parameter, with `0 <= l1_ratio <= 1`. Setting `l1_ratio=1` gives a pure L1-penalty, setting `l1_ratio=0` a pure L2-penalty. Any value between 0 and 1 gives an Elastic-Net penalty of the form `l1_ratio * L1 + (1 - l1_ratio) * L2`. .. warning:: Certain values of `l1_ratio`, i.e. some penalties, may not work with some solvers. See the parameter `solver` below, to know the compatibility between the penalty and solver. .. versionchanged:: 1.8 Default value changed from None to 0.0. .. deprecated:: 1.8 `None` is deprecated and will be removed in version 1.10. Always use `l1_ratio` to specify the penalty type.	0.0
	dual dual: bool, default=False Dual (constrained) or primal (regularized, see also :ref:`this equation `) formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer `dual=False` when n_samples > n_features.	False
	tol tol: float, default=1e-4 Tolerance for stopping criteria.	0.0001
	fit_intercept fit_intercept: bool, default=True Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.	True
	intercept_scaling intercept_scaling: float, default=1 Useful only when the solver `liblinear` is used and `self.fit_intercept` is set to `True`. In this case, `x` becomes `[x, self.intercept_scaling]`, i.e. a "synthetic" feature with constant value equal to `intercept_scaling` is appended to the instance vector. The intercept becomes ``intercept_scaling * synthetic_feature_weight``. .. note:: The synthetic feature weight is subject to L1 or L2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) `intercept_scaling` has to be increased.	1
	class_weight class_weight: dict or 'balanced', default=None Weights associated with classes in the form ``{class_label: weight}``. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as ``n_samples / (n_classes * np.bincount(y))``. Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified. .. versionadded:: 0.17 class_weight='balanced'	None
	random_state random_state: int, RandomState instance, default=None Used when ``solver`` == 'sag', 'saga' or 'liblinear' to shuffle the data. See :term:`Glossary ` for details.	None
	solver solver: {'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}, default='lbfgs' Algorithm to use in the optimization problem. Default is 'lbfgs'. To choose a solver, you might want to consider the following aspects: - 'lbfgs' is a good default solver because it works reasonably well for a wide class of problems. - For :term:`multiclass` problems (`n_classes >= 3`), all solvers except 'liblinear' minimize the full multinomial loss, 'liblinear' will raise an error. - 'newton-cholesky' is a good choice for `n_samples` >> `n_features * n_classes`, especially with one-hot encoded categorical features with rare categories. Be aware that the memory usage of this solver has a quadratic dependency on `n_features * n_classes` because it explicitly computes the full Hessian matrix. - For small datasets, 'liblinear' is a good choice, whereas 'sag' and 'saga' are faster for large ones; - 'liblinear' can only handle binary classification by default. To apply a one-versus-rest scheme for the multiclass setting one can wrap it with the :class:`~sklearn.multiclass.OneVsRestClassifier`. .. warning:: The choice of the algorithm depends on the penalty chosen (`l1_ratio=0` for L2-penalty, `l1_ratio=1` for L1-penalty and `0 < l1_ratio < 1` for Elastic-Net) and on (multinomial) multiclass support: ================= ======================== ====================== solver l1_ratio multinomial multiclass ================= ======================== ====================== 'lbfgs' l1_ratio=0 yes 'liblinear' l1_ratio=1 or l1_ratio=0 no 'newton-cg' l1_ratio=0 yes 'newton-cholesky' l1_ratio=0 yes 'sag' l1_ratio=0 yes 'saga' 0<=l1_ratio<=1 yes ================= ======================== ====================== .. note:: 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from :mod:`sklearn.preprocessing`. .. seealso:: Refer to the :ref:`User Guide ` for more information regarding :class:`LogisticRegression` and more specifically the :ref:`Table ` summarizing solver/penalty supports. .. versionadded:: 0.17 Stochastic Average Gradient (SAG) descent solver. Multinomial support in version 0.18. .. versionadded:: 0.19 SAGA solver. .. versionchanged:: 0.22 The default solver changed from 'liblinear' to 'lbfgs' in 0.22. .. versionadded:: 1.2 newton-cholesky solver. Multinomial support in version 1.6.	'lbfgs'
	max_iter max_iter: int, default=100 Maximum number of iterations taken for the solvers to converge.	200
	verbose verbose: int, default=0 For the liblinear and lbfgs solvers set verbose to any positive number for verbosity.	0
	warm_start warm_start: bool, default=False When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Useless for liblinear solver. See :term:`the Glossary `. .. versionadded:: 0.17 warm_start to support lbfgs, newton-cg, sag, saga solvers.	False
	n_jobs n_jobs: int, default=None Does not have any effect. .. deprecated:: 1.8 `n_jobs` is deprecated in version 1.8 and will be removed in 1.10.	None

	penalty penalty: {'l1', 'l2', 'elasticnet', None}, default='l2' Specify the norm of the penalty: - `None`: no penalty is added; - `'l2'`: add a L2 penalty term and it is the default choice; - `'l1'`: add a L1 penalty term; - `'elasticnet'`: both L1 and L2 penalty terms are added. .. warning:: Some penalties may not work with some solvers. See the parameter `solver` below, to know the compatibility between the penalty and solver. .. versionadded:: 0.19 l1 penalty with SAGA solver (allowing 'multinomial' + L1) .. deprecated:: 1.8 `penalty` was deprecated in version 1.8 and will be removed in 1.10. Use `l1_ratio` instead. `l1_ratio=0` for `penalty='l2'`, `l1_ratio=1` for `penalty='l1'` and `l1_ratio` set to any float between 0 and 1 for `'penalty='elasticnet'`.	'deprecated'
	C C: float, default=1.0 Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization. `C=np.inf` results in unpenalized logistic regression. For a visual example on the effect of tuning the `C` parameter with an L1 penalty, see: :ref:`sphx_glr_auto_examples_linear_model_plot_logistic_path.py`.	1.0
	l1_ratio l1_ratio: float, default=0.0 The Elastic-Net mixing parameter, with `0 <= l1_ratio <= 1`. Setting `l1_ratio=1` gives a pure L1-penalty, setting `l1_ratio=0` a pure L2-penalty. Any value between 0 and 1 gives an Elastic-Net penalty of the form `l1_ratio * L1 + (1 - l1_ratio) * L2`. .. warning:: Certain values of `l1_ratio`, i.e. some penalties, may not work with some solvers. See the parameter `solver` below, to know the compatibility between the penalty and solver. .. versionchanged:: 1.8 Default value changed from None to 0.0. .. deprecated:: 1.8 `None` is deprecated and will be removed in version 1.10. Always use `l1_ratio` to specify the penalty type.	0.0
	dual dual: bool, default=False Dual (constrained) or primal (regularized, see also :ref:`this equation `) formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer `dual=False` when n_samples > n_features.	False
	tol tol: float, default=1e-4 Tolerance for stopping criteria.	0.0001
	fit_intercept fit_intercept: bool, default=True Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.	True
	intercept_scaling intercept_scaling: float, default=1 Useful only when the solver `liblinear` is used and `self.fit_intercept` is set to `True`. In this case, `x` becomes `[x, self.intercept_scaling]`, i.e. a "synthetic" feature with constant value equal to `intercept_scaling` is appended to the instance vector. The intercept becomes ``intercept_scaling * synthetic_feature_weight``. .. note:: The synthetic feature weight is subject to L1 or L2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) `intercept_scaling` has to be increased.	1
	class_weight class_weight: dict or 'balanced', default=None Weights associated with classes in the form ``{class_label: weight}``. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as ``n_samples / (n_classes * np.bincount(y))``. Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified. .. versionadded:: 0.17 class_weight='balanced'	None
	random_state random_state: int, RandomState instance, default=None Used when ``solver`` == 'sag', 'saga' or 'liblinear' to shuffle the data. See :term:`Glossary ` for details.	None
	solver solver: {'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}, default='lbfgs' Algorithm to use in the optimization problem. Default is 'lbfgs'. To choose a solver, you might want to consider the following aspects: - 'lbfgs' is a good default solver because it works reasonably well for a wide class of problems. - For :term:`multiclass` problems (`n_classes >= 3`), all solvers except 'liblinear' minimize the full multinomial loss, 'liblinear' will raise an error. - 'newton-cholesky' is a good choice for `n_samples` >> `n_features * n_classes`, especially with one-hot encoded categorical features with rare categories. Be aware that the memory usage of this solver has a quadratic dependency on `n_features * n_classes` because it explicitly computes the full Hessian matrix. - For small datasets, 'liblinear' is a good choice, whereas 'sag' and 'saga' are faster for large ones; - 'liblinear' can only handle binary classification by default. To apply a one-versus-rest scheme for the multiclass setting one can wrap it with the :class:`~sklearn.multiclass.OneVsRestClassifier`. .. warning:: The choice of the algorithm depends on the penalty chosen (`l1_ratio=0` for L2-penalty, `l1_ratio=1` for L1-penalty and `0 < l1_ratio < 1` for Elastic-Net) and on (multinomial) multiclass support: ================= ======================== ====================== solver l1_ratio multinomial multiclass ================= ======================== ====================== 'lbfgs' l1_ratio=0 yes 'liblinear' l1_ratio=1 or l1_ratio=0 no 'newton-cg' l1_ratio=0 yes 'newton-cholesky' l1_ratio=0 yes 'sag' l1_ratio=0 yes 'saga' 0<=l1_ratio<=1 yes ================= ======================== ====================== .. note:: 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from :mod:`sklearn.preprocessing`. .. seealso:: Refer to the :ref:`User Guide ` for more information regarding :class:`LogisticRegression` and more specifically the :ref:`Table ` summarizing solver/penalty supports. .. versionadded:: 0.17 Stochastic Average Gradient (SAG) descent solver. Multinomial support in version 0.18. .. versionadded:: 0.19 SAGA solver. .. versionchanged:: 0.22 The default solver changed from 'liblinear' to 'lbfgs' in 0.22. .. versionadded:: 1.2 newton-cholesky solver. Multinomial support in version 1.6.	'lbfgs'
	max_iter max_iter: int, default=100 Maximum number of iterations taken for the solvers to converge.	200
	verbose verbose: int, default=0 For the liblinear and lbfgs solvers set verbose to any positive number for verbosity.	0
	warm_start warm_start: bool, default=False When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Useless for liblinear solver. See :term:`the Glossary `. .. versionadded:: 0.17 warm_start to support lbfgs, newton-cg, sag, saga solvers.	False
	n_jobs n_jobs: int, default=None Does not have any effect. .. deprecated:: 1.8 `n_jobs` is deprecated in version 1.8 and will be removed in 1.10.	None