Search: [data_science]

GitHub - yamafaktory/jql: A JSON Query Language CLI tool

A JSON Query Language CLI tool. A little like the venerable jq a little not, has different selection (or 'query') syntax.

Basically you pass in JSON to select specific JSON from whatever data.
Feels a little weird to me at first, but might actually be easier for intermediate/advanced use cases then the mind-bending mess jq sometimes turns into for me

(or rather, the constant trial and error with jiq that it turns into)

commandline · data_science · JSON

December 3, 2023 at 22:24:03 GMT+1 * · permalink

·

https://github.com/yamafaktory/jql

·

GitHub - mloubout/critic-markup: Critic markup filter for quarto

Enable rendering cricit-markup in your quarto output.
Could be really useful for a ms-word -less authoring pipeline.

data_science · lua · academia · writing

September 28, 2023 at 11:48:58 GMT+2 * · permalink

·

https://github.com/mloubout/critic-markup

·

GitHub - sciunto-org/python-bibtexparser: Bibtex parser for Python 3

Bibtex parser for Python 3. Parse bibtex, do whatever you want with it now as a python data structure.

One example of doing bibtex -> pandas dataframe is here

python · latex · data_science · academia

September 28, 2023 at 10:14:47 GMT+2 * · permalink

·

https://github.com/sciunto-org/python-bibtexparser

·

How to organize your Python data science project · GitHub

A pretty flexible and interesting approach to organizing data science projects. Combined with: https://www.earthdatascience.org/courses/intro-to-earth-data-science/open-reproducible-science/get-started-open-reproducible-science/best-practices-for-organizing-open-reproducible-science/ for more academic-oriented ideas,
should give a rough guide to finding good organizational structures.

academia · data_science · productivity

August 29, 2023 at 17:30:05 GMT+2 * · permalink

·

https://gist.github.com/ericmjl/27e50331f24db3e8f957d1fe7bbbe510

·

GitHub - eBay/tsv-utils: eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.

eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more. - GitHub - eBay/tsv-utils: eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.

TSV data wrangling utilities from the command line.
Written in D language.

dlang · data_science · commandline

August 4, 2023 at 11:17:13 GMT+2 * · permalink

·

https://github.com/eBay/tsv-utils

·

GitHub - stan-dev/stan: Advanced Statistical modelling on the commandline, with python, R, julia and more

Stan seems like a really advanced platform for calculating statistical models wherever you wish - shell, python, etc.

General documentation here, commandline docs here, python here

commandline · python · data_science

March 15, 2023 at 10:28:22 GMT+1 * · permalink

·

https://github.com/stan-dev/stan

·

GitHub - ThomasMBury/ewstools: Python package for early warning signals (EWS) of bifurcations in time series data.

A useful early warning signal computing library which can detect, calculate and notify you of bifurcations in time series.

python · library · opensource · data_science

February 11, 2023 at 15:40:50 GMT+1 * · permalink

·

https://github.com/ThomasMBury/ewstools

·

GitHub - alexhallam/tv: 📺(tv) Tidy Viewer is a cross-platform CLI csv pretty printer that uses column styling to maximize viewer enjoyment.

A fantastic tool for the commandline to quickly get a single view of a csv file. Auto-spaces, auto-indents, automatically tries to find the right numeric scale to display one csv file. Quick, easy, sweet!

(Called tidy-viewer on Archlinux AUR and command)

commandline · data_science

January 19, 2023 at 15:43:23 GMT+1 * · permalink

·

https://github.com/alexhallam/tv

·

GitHub - joouha/euporie: Jupyter notebooks in the terminal

Jupyter notebooks in the terminal. Run complete notebooks from your commandline for exploratory data analysis, before you use something like quarto for more permanent rendering. Seems very neat.

commandline · opensource · data_science

January 19, 2023 at 14:35:31 GMT+1 * · permalink

·

https://github.com/joouha/euporie

·

Hydra | Hydra

A framework for elegantly configuring complex applications.

Configuration management for python projects, may be useful to store simple and repeatable configurations for data science projects as well.

data_science · python

January 17, 2023 at 13:05:12 GMT+1 * · permalink

·

https://hydra.cc/

·

Data Version Control · DVC

Open-source version control system for Data Science and Machine Learning projects. Git-like experience to organize your data, models, and experiments.

A way to track data - even if it is in different locations - alongside code, mimicking its version control. Seems a little complicated but really useful, especially with additional features like data pipelines that are contained

data_science · python · git

January 17, 2023 at 13:04:21 GMT+1 * · permalink

·

https://dvc.org/

·

GitHub - kedro-org/kedro: A Python framework for creating reproducible, maintainable and modular data science code.

Reproducible datasci - you create data ingestors, then create small modules of transformation (the engineering), then do whatever you want with the data (the science).

Seems quite nice for larger projects and like it could save you some time down the road (forking another project off an existing or returning to an old project with its standardized nature).

data_science · python

December 16, 2022 at 14:59:11 GMT+1 * · permalink

·

https://github.com/kedro-org/kedro

·

Telling Stories with Data

An exhaustive book, free and available online, on publishing workflow.

Getting, preparing, cleaning data. Exploratory analysis and modelling with regression. Creating reproducible documents with quarto. Seems really nice and good to delve into for data analysis.

data_science · academia · statistics · python · markdown · book

September 8, 2022 at 12:54:09 GMT+2 * · permalink

·

https://tellingstorieswithdata.com/

·

Blog – Coding With Max

Statistical inferrence, various python plot types and Correlation vs causation explained in a series of blog-posts. Very beginner-friendly with drawings etc

statistics · python · data_science

August 24, 2022 at 17:30:18 GMT+2 * · permalink

·

https://codingwithmax.com/blog/

·

Statistics by Jim - Statistics By Jim

Statistics concepts explained (and tried to do so in plain english). That means some nuance will be lost but might get you to understand results quicker.

statistics · data_science

August 24, 2022 at 17:29:29 GMT+2 * · permalink

·

https://statisticsbyjim.com/

·

GitHub - ClaudioZandonella/trackdown: R package for collaborative writing and editing of R Markdown (or Sweave) documents in Google Docs.

Allows you to collaborate on RMarkdown writing through google docs. You will have to use RMarkdown syntax in google docs however, which seems even more cumbersome than plaintext integrations.

As far as I can see on the demonstration, it will also not do anything for better presentation while writing (since it isn't knitting or anything before you download from gdocs again of course). Don't know how well people would adopt this then.

markdown · academia · writing · data_science

August 18, 2022 at 10:56:57 GMT+2 * · permalink

·

https://github.com/claudiozandonella/trackdown

·

GitHub - quarto-dev/quarto-cli: Open-source scientific and technical publishing system built on Pandoc.

RMarkdown for the python world, built on pandoc. This seems like an amazing alternative to the R world (though it includes support for R) and all the bookdown and blogdown alternatives.

academia · writing · python · data_science

July 17, 2022 at 17:00:50 GMT+2 * · permalink

·

https://github.com/quarto-dev/quarto-cli

·

What is the 'Bible' of Data Science? : datascience

List of resources to delve deeper into data science and/or data engineering. Very interesting suggestions and enough overlap that it's not just a 'random list'

list · data_science · machine_learning · learning · statistics

June 8, 2022 at 10:32:48 GMT+2 * · permalink

·

https://teddit.net/r/datascience/comments/v6sv06/what_is_the_bible_of_data_science/

·

Advanced Python Web Scraping: Best Practices & Workarounds

Goes over advanced concepts of scraping (with Python):

asynchronous loading pages / client-side rendering (Selenium)
authentication
blacklisting
- header inspection
- request frequency
- pattern detection
- honeypots
captchas, redirects

python · data_science

May 17, 2022 at 15:56:58 GMT+2 * · permalink

·

https://www.codementor.io/blog/python-web-scraping-63l2v9sf2q

·

datasframe– Modern Pandas (Part 1)

An 8-part series on understanding the python pandas pipeline and concepts.

python · data_science

May 1, 2022 at 21:35:07 GMT+2 * · permalink

·

https://tomaugspurger.github.io/modern-1-intro

·