Search: [data_science]

GitHub - DataExpert-io/data-engineer-handbook: This is a repo with links to everything you'd ever want to learn about data engineering

A long list of data engineering resources - very focused on 'networking', 'company-driven' efforts and social media opportunities but contains nice book, course and certification recommendations as well.

data_science · programming · list

November 18, 2024 at 19:27:45 GMT+1 * · permalink

·

https://github.com/DataExpert-io/data-engineer-handbook

·

OpenPLZ API

Dokumentation des OpenPLZ API Projekts - REST API for requests of German, Austrian and Swiss zip codes (Postleitzahlen), street names and municipal districts (Gemeinden).

Takes much of its data from Openstreetmap from which it compiles a full csv file for Germany here.

opensource · api · data_science · dataset

August 21, 2024 at 13:17:23 GMT+2 * · permalink

·

https://www.openplzapi.org/de/

·

QGIS-Plugin: GeoBasis_Loader for German data sets.

(Q)GIS Plugin to load German data sets, both national and for subnational county-wide datasets. Comes with many datasets for easy consumption included.

data_science · opensource

August 21, 2024 at 13:09:13 GMT+2 * · permalink

·

https://geoobserver.de/qgis-plugin-geobasis-loader/

·

GitHub - pysal/pysal: PySAL: Python Spatial Analysis Library Meta-Package

Very useful spatio-temporal analysis tools within the python / jupyter ecosystem.

Can do all sorts of spatial analysis and visualization, but also more specific things like spatial inequality computation (using e.g. Theil index, Gini, ...).

python · data_science · academia

July 24, 2024 at 15:02:32 GMT+2 * · permalink

·

https://github.com/pysal/pysal?tab=readme-ov-file

·

What is the equivalent to df.to_markdown() for a Polars Dataframe?

Or, as the polars docs recommend:

df = pl.DataFrame(
    {"abc": [-2.5, 5.0], "mno": ["hello", "world"], "xyz": [True, False]}
)
with pl.Config(
    tbl_formatting="ASCII_MARKDOWN",
    tbl_hide_column_data_types=True,
    tbl_hide_dataframe_shape=True,
):

python · data_science

July 19, 2024 at 13:15:48 GMT+2 * · permalink

·

https://anonoverflow.frontendfriendly.xyz/questions/78185983/what-is-the-equivalent-to-df-to-markdown-for-a-polars-dataframe

·

GitHub - Kanaries/pygwalker: PyGWalker: Turn your pandas dataframe into an interactive UI for visual analysis

Quickly generate visualizations from pandas dataframes, by drag and drop.

Can be really handy for exploratory data viz.

python · data_science

June 20, 2024 at 16:07:36 GMT+2 * · permalink

·

https://github.com/Kanaries/pygwalker

·

Visualization Curriculum - Data visualizations with Altair

Using altair to visualize data, a book by jjallaire. Nice explanations, starts from the basics.

python · data_science · tutorial

January 10, 2024 at 11:10:15 GMT+1 * · permalink

·

https://jjallaire.github.io/visualization-curriculum/

·

The SQL Murder Mystery

Use SQL queries to solve the murder mystery. Suitable for beginners or experienced SQL sleuths.

Lean SQL with a murder-mystery game. How cool!

sql · data_science · tutorial

January 10, 2024 at 10:46:06 GMT+1 * · permalink

·

https://mystery.knightlab.com/

·

GitHub - nodiscc/hecat: Generic automation tool around data stored as plaintext YAML files

Grab data from websites (for now: has pre-made importers for Shaarli and Github Awesome- lists but can create your own), and do stuff with them.

Uses a variety of processors and exporters to shape, fix and transform the imported stuff. All based on yml 'pipeline' specifications. Essentially, it's a bit like setting up github actions/woodpecker CI for arbitrary pages.

hosting · python · data_science

January 10, 2024 at 09:30:25 GMT+1 * · permalink

·

https://github.com/nodiscc/hecat

·

GitHub - DagsHub/fds: Fast Data Science, AKA fds, is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping git and dvc

A cli wrapper combining git and dvs. Instead of doing first dvc commit then git commit then individual pushes you can just do it with one fds commit.

Similarly with fds status which is probably the most used command - get a quick at-a-glance overview of current project status for both data and code.

By being built as a wrapper it of course still allows delving into the individual programs for more advanced operations. Pretty clever, actually!

data_science · git · commandline

January 8, 2024 at 16:58:53 GMT+1 * · permalink

·

https://github.com/DAGsHub/fds

·

GitHub - MarcoGorelli/polars-upgrade: Automatically upgrade your Polars code to use the latest syntax available

Automatically upgrade your Polars code to use the latest syntax available. Run over python polars code and it switches out deprecated stuff for up-to-date stuff. Simple, easy, nice.

python · polars · data_science

December 13, 2023 at 09:03:52 GMT+1 * · permalink

·

https://github.com/MarcoGorelli/polars-upgrade

·

Causal Inference in R

Learning to model causality and make causal inferences with R (though applicable to other data science toolkits). Very nice, and splitting right down the middle of statistical and programmatic learning.

statistics · data_science

December 9, 2023 at 19:50:45 GMT+1 * · permalink

·

https://www.r-causal.org/

·

Modern Polars

A side-by-side comparison of the Polars and Pandas libraries. Nice gentle comparison and thus simultaneous 'introduction' to the tools.

(It's not really an introduction, it does expect you to have some prior knowledge on e.g. the pandas core concepts.)

python · data_science · tutorial

December 9, 2023 at 19:47:24 GMT+1 * · permalink

·

https://kevinheavey.github.io/modern-polars/

·

GitHub - MechanicalSoup/MechanicalSoup: A Python library for automating interaction with websites.

Enable scraping (and interaction) with websites, a little more high-level and a different api than beautifulsoup

python · data_science

December 9, 2023 at 09:26:53 GMT+1 * · permalink

·

https://github.com/MechanicalSoup/MechanicalSoup

·

A relatively faster approach for reading json lines file into pandas dataframe

Fantastically easy approach for .jsonl format to be loaded into pandas.

JSON · python · data_science

December 9, 2023 at 09:02:30 GMT+1 * · permalink

·

https://sundararamanp.medium.com/a-relatively-faster-approach-for-reading-json-lines-file-into-pandas-dataframe-90b57353fd38

·

1 The tidy text format | Text Mining with R

Using tidy data principles is a powerful way to make handling data easier and more effective, and this is no less true when it comes to dealing with text.

An in-depth description of handling strings, textual data with the tidy principles. Really neat for applications such as NLP or sentiment analysis or text modeling.

data_science

December 5, 2023 at 08:58:05 GMT+1 * · permalink

·

https://www.tidytextmining.com/tidytext

·