83 private links
A long list of data engineering resources - very focused on 'networking', 'company-driven' efforts and social media opportunities but contains nice book, course and certification recommendations as well.
Dokumentation des OpenPLZ API Projekts - REST API for requests of German, Austrian and Swiss zip codes (Postleitzahlen), street names and municipal districts (Gemeinden).
Takes much of its data from Openstreetmap from which it compiles a full csv file for Germany here.
(Q)GIS Plugin to load German data sets, both national and for subnational county-wide datasets. Comes with many datasets for easy consumption included.
Very useful spatio-temporal analysis tools within the python / jupyter ecosystem.
Can do all sorts of spatial analysis and visualization, but also more specific things like spatial inequality computation (using e.g. Theil index, Gini, ...).
Or, as the polars docs recommend:
df = pl.DataFrame(
{"abc": [-2.5, 5.0], "mno": ["hello", "world"], "xyz": [True, False]}
)
with pl.Config(
tbl_formatting="ASCII_MARKDOWN",
tbl_hide_column_data_types=True,
tbl_hide_dataframe_shape=True,
):
Quickly generate visualizations from pandas dataframes, by drag and drop.
Can be really handy for exploratory data viz.
Using altair to visualize data, a book by jjallaire. Nice explanations, starts from the basics.
Use SQL queries to solve the murder mystery. Suitable for beginners or experienced SQL sleuths.
Lean SQL with a murder-mystery game. How cool!
Grab data from websites (for now: has pre-made importers for Shaarli and Github Awesome- lists but can create your own), and do stuff with them.
Uses a variety of processors and exporters to shape, fix and transform the imported stuff. All based on yml 'pipeline' specifications. Essentially, it's a bit like setting up github actions/woodpecker CI for arbitrary pages.
A cli wrapper combining git and dvs. Instead of doing first dvc commit
then git commit
then individual pushes you can just do it with one fds commit
.
Similarly with fds status
which is probably the most used command - get a quick at-a-glance overview of current project status for both data and code.
By being built as a wrapper it of course still allows delving into the individual programs for more advanced operations. Pretty clever, actually!
Automatically upgrade your Polars code to use the latest syntax available. Run over python polars code and it switches out deprecated stuff for up-to-date stuff. Simple, easy, nice.
Learning to model causality and make causal inferences with R (though applicable to other data science toolkits). Very nice, and splitting right down the middle of statistical and programmatic learning.
A side-by-side comparison of the Polars and Pandas libraries. Nice gentle comparison and thus simultaneous 'introduction' to the tools.
(It's not really an introduction, it does expect you to have some prior knowledge on e.g. the pandas core concepts.)
Enable scraping (and interaction) with websites, a little more high-level and a different api than beautifulsoup
Fantastically easy approach for .jsonl format to be loaded into pandas.
Using tidy data principles is a powerful way to make handling data easier and more effective, and this is no less true when it comes to dealing with text.
An in-depth description of handling strings, textual data with the tidy principles. Really neat for applications such as NLP or sentiment analysis or text modeling.
This book demonstrates how to use the Tidyverse collection of packages for doing data science.
A long and in-depth description of tidy data, how to arrive there, pitfalls to avoid. Also gives pointers on visualization and modeling.
Written for R, but a lot of the concepts can be applied universally.
Solutions to the exercises in “R for Data Science” by Garrett Grolemund and Hadley Wickham.
The book itself is also available online, here: https://r4ds.had.co.nz/
While written for R, the exercies could be easily adapted say for python pandas, and undertaken that way.
Same thing as GRON - turn JSON into stream-friendly (i.e. greppable) destructured text.
Just faster! I suppose. Haven't really tried it myself
GReppablejsON!
Super awesome for de-structuring json responses to then simply work with through grep or other text-only speaking stream tools on the command line.