79 private links
Using altair to visualize data, a book by jjallaire. Nice explanations, starts from the basics.
Use SQL queries to solve the murder mystery. Suitable for beginners or experienced SQL sleuths.
Lean SQL with a murder-mystery game. How cool!
Grab data from websites (for now: has pre-made importers for Shaarli and Github Awesome- lists but can create your own), and do stuff with them.
Uses a variety of processors and exporters to shape, fix and transform the imported stuff. All based on yml 'pipeline' specifications. Essentially, it's a bit like setting up github actions/woodpecker CI for arbitrary pages.
A cli wrapper combining git and dvs. Instead of doing first dvc commit
then git commit
then individual pushes you can just do it with one fds commit
.
Similarly with fds status
which is probably the most used command - get a quick at-a-glance overview of current project status for both data and code.
By being built as a wrapper it of course still allows delving into the individual programs for more advanced operations. Pretty clever, actually!
Automatically upgrade your Polars code to use the latest syntax available. Run over python polars code and it switches out deprecated stuff for up-to-date stuff. Simple, easy, nice.
Learning to model causality and make causal inferences with R (though applicable to other data science toolkits). Very nice, and splitting right down the middle of statistical and programmatic learning.
A side-by-side comparison of the Polars and Pandas libraries. Nice gentle comparison and thus simultaneous 'introduction' to the tools.
(It's not really an introduction, it does expect you to have some prior knowledge on e.g. the pandas core concepts.)
Enable scraping (and interaction) with websites, a little more high-level and a different api than beautifulsoup
Fantastically easy approach for .jsonl format to be loaded into pandas.
Using tidy data principles is a powerful way to make handling data easier and more effective, and this is no less true when it comes to dealing with text.
An in-depth description of handling strings, textual data with the tidy principles. Really neat for applications such as NLP or sentiment analysis or text modeling.
This book demonstrates how to use the Tidyverse collection of packages for doing data science.
A long and in-depth description of tidy data, how to arrive there, pitfalls to avoid. Also gives pointers on visualization and modeling.
Written for R, but a lot of the concepts can be applied universally.
Solutions to the exercises in “R for Data Science” by Garrett Grolemund and Hadley Wickham.
The book itself is also available online, here: https://r4ds.had.co.nz/
While written for R, the exercies could be easily adapted say for python pandas, and undertaken that way.
Same thing as GRON - turn JSON into stream-friendly (i.e. greppable) destructured text.
Just faster! I suppose. Haven't really tried it myself
GReppablejsON!
Super awesome for de-structuring json responses to then simply work with through grep or other text-only speaking stream tools on the command line.
A JSON Query Language CLI tool. A little like the venerable jq a little not, has different selection (or 'query') syntax.
Basically you pass in JSON to select specific JSON from whatever data.
Feels a little weird to me at first, but might actually be easier for intermediate/advanced use cases then the mind-bending mess jq sometimes turns into for me
(or rather, the constant trial and error with jiq that it turns into)
Enable rendering cricit-markup in your quarto output.
Could be really useful for a ms-word -less authoring pipeline.
Bibtex parser for Python 3. Parse bibtex, do whatever you want with it now as a python data structure.
One example of doing bibtex -> pandas dataframe is here
A pretty flexible and interesting approach to organizing data science projects. Combined with: https://www.earthdatascience.org/courses/intro-to-earth-data-science/open-reproducible-science/get-started-open-reproducible-science/best-practices-for-organizing-open-reproducible-science/ for more academic-oriented ideas,
should give a rough guide to finding good organizational structures.
eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more. - GitHub - eBay/tsv-utils: eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.
TSV data wrangling utilities from the command line.
Written in D language.