Search: [data_science]

GitHub - jupyter-server/jupyverse: A Jupyter server based on FastAPI :rocket:

A Jupyter server based on FastAPI. Allows pretty easy self-hosting (for smaller instances) and thus collaborative editing. Supports fief authentication which is neat.

python · hosting · data_science

February 5, 2025 at 10:20:31 PM GMT+1 * · permalink

·

https://github.com/jupyter-server/jupyverse

·

Jonathan Whitmore | Mental Models to Use and Avoid as a Data Scientist

A rundown of the dangers of p-hacking and fishing expeditions when doing data science, and strategies to avoid succumbing to them. Nice, concise talk.

data_science · talk

January 18, 2025 at 10:41:32 AM GMT+1 * · permalink

·

https://www.youtube.com/watch?v=oj7kKE3zKig

·

GitHub - andreasgerstmayr/fava-dashboards: Custom Dashboards for Beancount in Fava

Allows you to code up simple dashboards using a combination of beanql (to select data) and python (to operate on it) for fava (beancount web GUI).

Super useful!

finance · data_science · hosting · python · library

January 17, 2025 at 6:01:42 PM GMT+1 * · permalink

·

https://github.com/andreasgerstmayr/fava-dashboards?tab=readme-ov-file

·

RRDtool - Round Robin Database

A 'simple' (i.e. very powerful and complex if you get into the weeds) tool to ingest data, and later plot it out again. Creates the kind of plots that are fairly well known especially for computer science applications, when plotting network or drive reads, throughput, cpu usage and so on (e.g. in OpenMediaVault and I believe Nextcloud)

opensource · data_science

December 21, 2024 at 1:20:58 PM GMT+1 * · permalink

·

https://oss.oetiker.ch/rrdtool/

·

GitHub - DataExpert-io/data-engineer-handbook: This is a repo with links to everything you'd ever want to learn about data engineering

A long list of data engineering resources - very focused on 'networking', 'company-driven' efforts and social media opportunities but contains nice book, course and certification recommendations as well.

data_science · programming · list

November 18, 2024 at 7:27:45 PM GMT+1 * · permalink

·

https://github.com/DataExpert-io/data-engineer-handbook

·

OpenPLZ API

Dokumentation des OpenPLZ API Projekts - REST API for requests of German, Austrian and Swiss zip codes (Postleitzahlen), street names and municipal districts (Gemeinden).

Takes much of its data from Openstreetmap from which it compiles a full csv file for Germany here.

opensource · api · data_science · dataset · osm

August 21, 2024 at 1:17:23 PM GMT+2 * · permalink

·

https://www.openplzapi.org/de/

·

QGIS-Plugin: GeoBasis_Loader for German data sets.

(Q)GIS Plugin to load German data sets, both national and for subnational county-wide datasets. Comes with many datasets for easy consumption included.

data_science · opensource

August 21, 2024 at 1:09:13 PM GMT+2 * · permalink

·

https://geoobserver.de/qgis-plugin-geobasis-loader/

·

GitHub - pysal/pysal: PySAL: Python Spatial Analysis Library Meta-Package

Very useful spatio-temporal analysis tools within the python / jupyter ecosystem.

Can do all sorts of spatial analysis and visualization, but also more specific things like spatial inequality computation (using e.g. Theil index, Gini, ...).

python · data_science · academia · library

July 24, 2024 at 3:02:32 PM GMT+2 * · permalink

·

https://github.com/pysal/pysal?tab=readme-ov-file

·

What is the equivalent to df.to_markdown() for a Polars Dataframe?

Or, as the polars docs recommend:

df = pl.DataFrame(
    {"abc": [-2.5, 5.0], "mno": ["hello", "world"], "xyz": [True, False]}
)
with pl.Config(
    tbl_formatting="ASCII_MARKDOWN",
    tbl_hide_column_data_types=True,
    tbl_hide_dataframe_shape=True,
):

python · data_science

July 19, 2024 at 1:15:48 PM GMT+2 * · permalink

·

https://anonoverflow.frontendfriendly.xyz/questions/78185983/what-is-the-equivalent-to-df-to-markdown-for-a-polars-dataframe

·

GitHub - Kanaries/pygwalker: PyGWalker: Turn your pandas dataframe into an interactive UI for visual analysis

Quickly generate visualizations from pandas dataframes, by drag and drop.

Can be really handy for exploratory data viz.

python · data_science · library

June 20, 2024 at 4:07:36 PM GMT+2 * · permalink

·

https://github.com/Kanaries/pygwalker

·

Visualization Curriculum - Data visualizations with Altair

Using altair to visualize data, a book by jjallaire. Nice explanations, starts from the basics.

python · data_science · tutorial

January 10, 2024 at 11:10:15 AM GMT+1 * · permalink

·

https://jjallaire.github.io/visualization-curriculum/

·

The SQL Murder Mystery

Use SQL queries to solve the murder mystery. Suitable for beginners or experienced SQL sleuths.

Lean SQL with a murder-mystery game. How cool!

sql · data_science · tutorial

January 10, 2024 at 10:46:06 AM GMT+1 * · permalink

·

https://mystery.knightlab.com/

·

GitHub - nodiscc/hecat: Generic automation tool around data stored as plaintext YAML files

Grab data from websites (for now: has pre-made importers for Shaarli and Github Awesome- lists but can create your own), and do stuff with them.

Uses a variety of processors and exporters to shape, fix and transform the imported stuff. All based on yml 'pipeline' specifications. Essentially, it's a bit like setting up github actions/woodpecker CI for arbitrary pages.

hosting · python · data_science

January 10, 2024 at 9:30:25 AM GMT+1 * · permalink

·

https://github.com/nodiscc/hecat

·

GitHub - DagsHub/fds: Fast Data Science, AKA fds, is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping git and dvc

A cli wrapper combining git and dvs. Instead of doing first dvc commit then git commit then individual pushes you can just do it with one fds commit.

Similarly with fds status which is probably the most used command - get a quick at-a-glance overview of current project status for both data and code.

By being built as a wrapper it of course still allows delving into the individual programs for more advanced operations. Pretty clever, actually!

data_science · git · commandline

January 8, 2024 at 4:58:53 PM GMT+1 * · permalink

·

https://github.com/DAGsHub/fds

·

GitHub - MarcoGorelli/polars-upgrade: Automatically upgrade your Polars code to use the latest syntax available

Automatically upgrade your Polars code to use the latest syntax available. Run over python polars code and it switches out deprecated stuff for up-to-date stuff. Simple, easy, nice.

python · polars · data_science

December 13, 2023 at 9:03:52 AM GMT+1 * · permalink

·

https://github.com/MarcoGorelli/polars-upgrade

·

Causal Inference in R

Learning to model causality and make causal inferences with R (though applicable to other data science toolkits). Very nice, and splitting right down the middle of statistical and programmatic learning.

statistics · data_science

December 9, 2023 at 7:50:45 PM GMT+1 * · permalink

·

https://www.r-causal.org/

·

Modern Polars

A side-by-side comparison of the Polars and Pandas libraries. Nice gentle comparison and thus simultaneous 'introduction' to the tools.

(It's not really an introduction, it does expect you to have some prior knowledge on e.g. the pandas core concepts.)

python · data_science · tutorial

December 9, 2023 at 7:47:24 PM GMT+1 * · permalink

·

https://kevinheavey.github.io/modern-polars/

·

GitHub - MechanicalSoup/MechanicalSoup: A Python library for automating interaction with websites.

Enable scraping (and interaction) with websites, a little more high-level and a different api than beautifulsoup

python · data_science

December 9, 2023 at 9:26:53 AM GMT+1 * · permalink

·

https://github.com/MechanicalSoup/MechanicalSoup

·

A relatively faster approach for reading json lines file into pandas dataframe

Fantastically easy approach for .jsonl format to be loaded into pandas.

JSON · python · data_science

December 9, 2023 at 9:02:30 AM GMT+1 * · permalink

·

https://sundararamanp.medium.com/a-relatively-faster-approach-for-reading-json-lines-file-into-pandas-dataframe-90b57353fd38

·

1 The tidy text format | Text Mining with R

Using tidy data principles is a powerful way to make handling data easier and more effective, and this is no less true when it comes to dealing with text.

An in-depth description of handling strings, textual data with the tidy principles. Really neat for applications such as NLP or sentiment analysis or text modeling.

data_science

December 5, 2023 at 8:58:05 AM GMT+1 * · permalink

·

https://www.tidytextmining.com/tidytext

·