Weekly Shaarli
Week 49 (December 4, 2023)
This textbook covers the contents of an introductory statistics class, as typically taught to undergraduate psychology, health or social science students.
Fun little introductory book for descriptive statistics, visualizations, then some theory, regressions and at the very end a tiny bit of bayesian stats.
Uses 'jamovi', a statistical software, but can be acommodated with all manner of toole like SPSS, PSPP, R, Pandas, Polars, etc etc
Read and write json line format with python, easy, efficient.
Yet Another.. zi? I don't know what the title stands for but I do know it like a pretty fast and smooth TUI fm.
This book demonstrates how to use the Tidyverse collection of packages for doing data science.
A long and in-depth description of tidy data, how to arrive there, pitfalls to avoid. Also gives pointers on visualization and modeling.
Written for R, but a lot of the concepts can be applied universally.
The Crystal Programming Language.
An interesting mix of ruby-ish syntax, static type checking and compilation, with the ability to create C-bindings. Looks like a neat little language!
Learning to model causality and make causal inferences with R (though applicable to other data science toolkits). Very nice, and splitting right down the middle of statistical and programmatic learning.
Enable scraping (and interaction) with websites, a little more high-level and a different api than beautifulsoup
Fantastically easy approach for .jsonl format to be loaded into pandas.
Red is a next-generation programming language strongly inspired by Rebol, but with a broader field of usage thanks to its native-code compiler, from system programming to high-level scripting and cross-platform reactive GUI, while providing modern support for concurrency, all in a zero-install, zero-config, single ~1MB file!
A side-by-side comparison of the Polars and Pandas libraries. Nice gentle comparison and thus simultaneous 'introduction' to the tools.
(It's not really an introduction, it does expect you to have some prior knowledge on e.g. the pandas core concepts.)
Json but less overhead. (and binary not plain but eh)
The JSON lines format (.jsonl
or I believe some also do .jl
?)
Pretty easy to handle, good for streaming through information, can be extended and is faairly readable.
A little less readable than csv/tsv (if they are well formatted) but you can extend a file with another column at any point (which is a huge pain with csv), cells can have actual types and the formatting is much easier.
Using tidy data principles is a powerful way to make handling data easier and more effective, and this is no less true when it comes to dealing with text.
An in-depth description of handling strings, textual data with the tidy principles. Really neat for applications such as NLP or sentiment analysis or text modeling.
Solutions to the exercises in “R for Data Science” by Garrett Grolemund and Hadley Wickham.
The book itself is also available online, here: https://r4ds.had.co.nz/
While written for R, the exercies could be easily adapted say for python pandas, and undertaken that way.
Make Linux (made for debian/xubuntu) look exactly like Win95 - just in case you ever feel nostalgic for a weirdly homey desktop feel.