Daily Shaarli

All links of one day in a single page.

December 9, 2023

learning statistics with jamovi

This textbook covers the contents of an introductory statistics class, as typically taught to undergraduate psychology, health or social science students.

Fun little introductory book for descriptive statistics, visualizations, then some theory, regressions and at the very end a tiny bit of bayesian stats.
Uses 'jamovi', a statistical software, but can be acommodated with all manner of toole like SPSS, PSPP, R, Pandas, Polars, etc etc

GitHub - wbolster/jsonlines: python library to simplify working with jsonlines and ndjson data

Read and write json line format with python, easy, efficient.

Causal Inference in R

Learning to model causality and make causal inferences with R (though applicable to other data science toolkits). Very nice, and splitting right down the middle of statistical and programmatic learning.

GitHub - MechanicalSoup/MechanicalSoup: A Python library for automating interaction with websites.

Enable scraping (and interaction) with websites, a little more high-level and a different api than beautifulsoup

A relatively faster approach for reading json lines file into pandas dataframe

Fantastically easy approach for .jsonl format to be loaded into pandas.

Modern Polars

A side-by-side comparison of the Polars and Pandas libraries. Nice gentle comparison and thus simultaneous 'introduction' to the tools.

(It's not really an introduction, it does expect you to have some prior knowledge on e.g. the pandas core concepts.)

GitHub - couchbase/fleece: A super-fast, compact, JSON-equivalent binary data format

Json but less overhead. (and binary not plain but eh)

JSON Lines

The JSON lines format (.jsonl or I believe some also do .jl ?)

Pretty easy to handle, good for streaming through information, can be extended and is faairly readable.

A little less readable than csv/tsv (if they are well formatted) but you can extend a file with another column at any point (which is a huge pain with csv), cells can have actual types and the formatting is much easier.