88 private links
A Jupyter server based on FastAPI. Allows pretty easy self-hosting (for smaller instances) and thus collaborative editing. Supports fief authentication which is neat.
A rundown of the dangers of p-hacking and fishing expeditions when doing data science, and strategies to avoid succumbing to them. Nice, concise talk.
Allows you to code up simple dashboards using a combination of beanql (to select data) and python (to operate on it) for fava (beancount web GUI).
Super useful!
A 'simple' (i.e. very powerful and complex if you get into the weeds) tool to ingest data, and later plot it out again. Creates the kind of plots that are fairly well known especially for computer science applications, when plotting network or drive reads, throughput, cpu usage and so on (e.g. in OpenMediaVault and I believe Nextcloud)
A long list of data engineering resources - very focused on 'networking', 'company-driven' efforts and social media opportunities but contains nice book, course and certification recommendations as well.
Dokumentation des OpenPLZ API Projekts - REST API for requests of German, Austrian and Swiss zip codes (Postleitzahlen), street names and municipal districts (Gemeinden).
Takes much of its data from Openstreetmap from which it compiles a full csv file for Germany here.
(Q)GIS Plugin to load German data sets, both national and for subnational county-wide datasets. Comes with many datasets for easy consumption included.
Very useful spatio-temporal analysis tools within the python / jupyter ecosystem.
Can do all sorts of spatial analysis and visualization, but also more specific things like spatial inequality computation (using e.g. Theil index, Gini, ...).
Or, as the polars docs recommend:
df = pl.DataFrame(
{"abc": [-2.5, 5.0], "mno": ["hello", "world"], "xyz": [True, False]}
)
with pl.Config(
tbl_formatting="ASCII_MARKDOWN",
tbl_hide_column_data_types=True,
tbl_hide_dataframe_shape=True,
):
Quickly generate visualizations from pandas dataframes, by drag and drop.
Can be really handy for exploratory data viz.
Using altair to visualize data, a book by jjallaire. Nice explanations, starts from the basics.
Use SQL queries to solve the murder mystery. Suitable for beginners or experienced SQL sleuths.
Lean SQL with a murder-mystery game. How cool!
Grab data from websites (for now: has pre-made importers for Shaarli and Github Awesome- lists but can create your own), and do stuff with them.
Uses a variety of processors and exporters to shape, fix and transform the imported stuff. All based on yml 'pipeline' specifications. Essentially, it's a bit like setting up github actions/woodpecker CI for arbitrary pages.
A cli wrapper combining git and dvs. Instead of doing first dvc commit
then git commit
then individual pushes you can just do it with one fds commit
.
Similarly with fds status
which is probably the most used command - get a quick at-a-glance overview of current project status for both data and code.
By being built as a wrapper it of course still allows delving into the individual programs for more advanced operations. Pretty clever, actually!
Automatically upgrade your Polars code to use the latest syntax available. Run over python polars code and it switches out deprecated stuff for up-to-date stuff. Simple, easy, nice.
Learning to model causality and make causal inferences with R (though applicable to other data science toolkits). Very nice, and splitting right down the middle of statistical and programmatic learning.
A side-by-side comparison of the Polars and Pandas libraries. Nice gentle comparison and thus simultaneous 'introduction' to the tools.
(It's not really an introduction, it does expect you to have some prior knowledge on e.g. the pandas core concepts.)
Enable scraping (and interaction) with websites, a little more high-level and a different api than beautifulsoup
Fantastically easy approach for .jsonl format to be loaded into pandas.
Using tidy data principles is a powerful way to make handling data easier and more effective, and this is no less true when it comes to dealing with text.
An in-depth description of handling strings, textual data with the tidy principles. Really neat for applications such as NLP or sentiment analysis or text modeling.