Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions Cheatsheats/plotting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
| Feature Type | Target Type | What you want to see | Plot to use | What it tells you (in simple words) | Python example |
| ------------ | ------------------- | -------------------------------------- | ------------------- | ------------------------------------ | ----------------------------------------------- |
| Number | Number | Does feature increase/decrease target? | Scatter | Shows upward/downward pattern | `sns.scatterplot(x='age', y='salary', data=df)` |
| Number | Number | Are there extreme values? | Scatter | Dots far away = outliers | `sns.scatterplot(x='age', y='salary', data=df)` |
| Number | Number | Is data spread normal or skewed? | Histogram | Shows shape of data | `sns.histplot(df['age'], kde=True)` |
| Number | Number | Compare many numeric features at once | Heatmap | Which feature relates most to target | `sns.heatmap(df.corr(), annot=True)` |
| Number | Class (0/1, Yes/No) | Do classes look different? | Box plot | If boxes separate → feature useful | `sns.boxplot(x='class', y='age', data=df)` |
| Number | Class | How dense values are per class | Violin plot | Shows distribution per class | `sns.violinplot(x='class', y='age', data=df)` |
| Category | Number | Which category has higher target? | Bar plot (mean) | Shows average target per category | `sns.barplot(x='city', y='sales', data=df)` |
| Category | Number | Which category appears most? | Count plot | Shows frequency | `sns.countplot(x='city', data=df)` |
| Category | Class | Relation between two categories | Count plot with hue | Shows class split per category | `sns.countplot(x='gender', hue='buy', data=df)` |
| Many Numbers | Number/Class | Overall relationship view | Pairplot | Quick scan of all relations | `sns.pairplot(df, hue='class')` |
21 changes: 21 additions & 0 deletions Cheatsheats/testing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
| Test | Purpose (What it checks) | Data Type | Key Assumptions | Python (scipy/statsmodels) |
| ------------------------ | ------------------------------------------------- | ----------------------------------- | ----------------------------------------------------- | ------------------------------------------------------------------ |
| **One-sample t-test** | Compare sample mean to a known value | Numeric | Normal data | `stats.ttest_1samp(x, μ)` |
| **Independent t-test** | Compare means of 2 independent groups | Numeric (2 groups) | Normal, equal variance | `stats.ttest_ind(a, b)` |
| **Paired t-test** | Compare same group before vs after | Numeric (paired) | Normal differences | `stats.ttest_rel(a, b)` |
| **One-way ANOVA** | Compare means of 3+ groups | Numeric (3+ groups) | Normal, equal variance | `stats.f_oneway(g1, g2, g3)` |
| **One-way ANOVA (OLS)** | Check if **mean age differs across classes** | Numeric (age) + categorical (class) | Normal residuals, equal variance, independent samples | `ols('age ~ class', data=df).fit()` + `sm.stats.anova_lm(model)` |
| **Two-way ANOVA** | Effect of 2 factors on mean | Numeric + categorical | Normal, equal variance | `statsmodels.formula.api.ols()` |
| **Mann–Whitney U** | 2 groups, non-normal | Ordinal/Numeric | Independent | `stats.mannwhitneyu(a, b)` |
| **Wilcoxon test** | Paired, non-normal | Ordinal/Numeric | Paired | `stats.wilcoxon(a, b)` |
| **Kruskal–Wallis** | 3+ groups, non-normal | Ordinal/Numeric | Independent | `stats.kruskal(g1, g2, g3)` |
| **Chi-square test** | Relationship between categories | Categorical | Expected freq > 5 | `stats.chi2_contingency(table)` |
| **Fisher’s Exact** | Small categorical samples | Categorical | 2×2 table | `stats.fisher_exact(table)` |
| **Pearson correlation** | Linear relation between 2 vars | Numeric | Normal, linear | `stats.pearsonr(x, y)` |
| **Spearman correlation** | Rank-based relation | Ordinal/Numeric | Monotonic | `stats.spearmanr(x, y)` |
| **Linear regression** | Predict Y from X | Numeric | Linearity, normal errors | `stats.linregress(x, y)` |
| **Logistic regression** | Predict binary outcome | Numeric + categorical | Independent | `sklearn.linear_model.LogisticRegression()` |
| **Shapiro-Wilk** | Test for normality | Numeric | Random sample | `stats.shapiro(x)` |
| **Kolmogorov–Smirnov** | Compare to a distribution | Numeric | Continuous | `stats.kstest(x, 'norm')` |
| **Levene’s test** | Test equal variance | Numeric | Independent | `stats.levene(a, b)` |
| **Tukey’s HSD** | Find **which specific groups differ** after ANOVA | Numeric + categorical (3+ groups) | Normal data, equal variance, independent groups | `statsmodels.stats.multicomp.pairwise_tukeyhsd(endog=y, groups=g)` |
Loading