83 private links
Can convert (and revert) jupyter notebooks to markdown and script files (i.e. plaintext files instead of singular json code files).
Could be useful for data tracking or converting between a jupyter-centric and a vim-centric data workflow.
Third edition of the famous data analysis learning book for pandas (and numpy) by the pandas author.
karlicoss of the data liberation project HPI explains how to best store and access data moved from various points in the cloud/web/internet to your drives and why databases might not always be the best choice.
TLDR.
Save your grabbed data without any manipulation.
Let the manipulation happen every time you access/interpret the data.
If you have slices of data (mostly time frames), don't try to merge them on disk but save as extra files and merge on access/interpretation as well.
You can make use of databases for access caching since the last points generate some overhead for each access.
easiest answer is with pandas as a library:
df = pd.read_json('inputfile.json')
df.to_csv('outputfile.csv', encoding='utf-8', index = False)
read_json converts a JSON string to a pandas object (either a series or dataframe).
to_csv can either return a string or write directly to a csv-file. See the docs for to_csv.
works best when json is an array of structured objects (unstructured data, see SO answer in link)
additional pandas to csv tips see this SO thread
Also, a really generic template you could use is something like this: 1. Find a data blob, API, or web scrape a site for raw data you're interested in. 2. Figure out how to store that data. Do you need a relational database or maybe NoSQL? How will the records be stored and what does your data model look like? 3. Use analytics packages like numpy or something else, draw conclusions or find interesting themes about your data 4. Now do something with it! Maybe a front end to display it all. You can use Dash to build a quick and light visualization of your findings or something more full stack like a Django application or even Flask. Totally up to you.
An interesting use of loki to grab shell history, store it centrally, and then re-use it from the commandline to replace its traditional history functionality. Also includes a little tidbit on then integrating your shell history with e.g. Grafana.
This post shows how to figure out the best data layout for InfluxDB v2, some schema design best practices, and a schema development example.
Huge collection of computable, curated data from demographics to language, science & math, politics, social media. Many formats: numerical, time series, image, audio, geospatial. Can be exported as simple csv, or worked with in python notebooks on the page.
A long list of data sources, divided by general topics, and of varying quality.
Research Guides: Social Science Data Sources & Statistical Methods: Free Data Sources
A collection of free sources of various kinds of data, on recommendation-basis from EMU.
Grafana is an amazing visualization tool used mainly by IT teams to monitor their infrastructure. As it’s open-source there’s huge contribution from the community on both datasource and panel making…
Working with the Chronicling America API Chronicling America makes available American newspapers between 1789-1963. 🙌 This means we can access the data available by their API to explore the contents of the archive. First, we need our URL and search parameters. Using the inputs below, we can built our URL for a search term and the number of results we'd like back. We'll default to and , but feel free to adjust the inputs. We return our data, which comes back to us by providing the total number of
Can send text between any two adjacent vim windows.
Text can be defined by visual selection, motions and text objects.
Tries to position the cursor in a convenient place after each call.
Dot repeatable.
can be used for python or R for data science repl experience
advanced gnuplot functions
simple gnuplot introduction
The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum: - metabase
Extensive python introduction book leaning toward cli work and data ingestion (json, csv), as well as transformation (working with dates, numbering, etc)