Weekly Shaarli

All links of one week in a single page.

Week 49 (December 4, 2023)

learning statistics with jamovi

This textbook covers the contents of an introductory statistics class, as typically taught to undergraduate psychology, health or social science students.

Fun little introductory book for descriptive statistics, visualizations, then some theory, regressions and at the very end a tiny bit of bayesian stats.
Uses 'jamovi', a statistical software, but can be acommodated with all manner of toole like SPSS, PSPP, R, Pandas, Polars, etc etc

GitHub - wbolster/jsonlines: python library to simplify working with jsonlines and ndjson data

Read and write json line format with python, easy, efficient.

GitHub - sxyazi/yazi: đź’Ą Blazing fast terminal file manager written in Rust, based on async I/O.

Yet Another.. zi? I don't know what the title stands for but I do know it like a pretty fast and smooth TUI fm.

Chapter 3 Wrangling Data in the Tidyverse | Tidyverse Skills for Data Science

This book demonstrates how to use the Tidyverse collection of packages for doing data science.

A long and in-depth description of tidy data, how to arrive there, pitfalls to avoid. Also gives pointers on visualization and modeling.

Written for R, but a lot of the concepts can be applied universally.

GitHub - crystal-lang/crystal: The Crystal Programming Language

The Crystal Programming Language.

An interesting mix of ruby-ish syntax, static type checking and compilation, with the ability to create C-bindings. Looks like a neat little language!

Causal Inference in R

Learning to model causality and make causal inferences with R (though applicable to other data science toolkits). Very nice, and splitting right down the middle of statistical and programmatic learning.

GitHub - MechanicalSoup/MechanicalSoup: A Python library for automating interaction with websites.

Enable scraping (and interaction) with websites, a little more high-level and a different api than beautifulsoup

A relatively faster approach for reading json lines file into pandas dataframe

Fantastically easy approach for .jsonl format to be loaded into pandas.

GitHub - red/red: Red is a next-generation programming language strongly inspired by Rebol, but with a broader field of usage thanks to its native-code compiler, from system programming to high-level scripting and cross-platform reactive GUI, while providing modern support for concurrency, all in a zero-install, zero-config, single ~1MB file!

Red is a next-generation programming language strongly inspired by Rebol, but with a broader field of usage thanks to its native-code compiler, from system programming to high-level scripting and cross-platform reactive GUI, while providing modern support for concurrency, all in a zero-install, zero-config, single ~1MB file!

Modern Polars

A side-by-side comparison of the Polars and Pandas libraries. Nice gentle comparison and thus simultaneous 'introduction' to the tools.

(It's not really an introduction, it does expect you to have some prior knowledge on e.g. the pandas core concepts.)

GitHub - couchbase/fleece: A super-fast, compact, JSON-equivalent binary data format

Json but less overhead. (and binary not plain but eh)

JSON Lines

The JSON lines format (.jsonl or I believe some also do .jl ?)

Pretty easy to handle, good for streaming through information, can be extended and is faairly readable.

A little less readable than csv/tsv (if they are well formatted) but you can extend a file with another column at any point (which is a huge pain with csv), cells can have actual types and the formatting is much easier.

1 The tidy text format | Text Mining with R

Using tidy data principles is a powerful way to make handling data easier and more effective, and this is no less true when it comes to dealing with text.

An in-depth description of handling strings, textual data with the tidy principles. Really neat for applications such as NLP or sentiment analysis or text modeling.

R for Data Science: Exercise Solutions

Solutions to the exercises in “R for Data Science” by Garrett Grolemund and Hadley Wickham.

The book itself is also available online, here: https://r4ds.had.co.nz/

While written for R, the exercies could be easily adapted say for python pandas, and undertaken that way.

GitHub - grassmunk/Chicago95: A rendition of everyone's favorite 1995 Microsoft operating system for Linux.

Make Linux (made for debian/xubuntu) look exactly like Win95 - just in case you ever feel nostalgic for a weirdly homey desktop feel.