Weekly Shaarli

Previous week

All links of one week in a single page.

Next week

Week 12 (March 22, 2021)

WinWorld: Microsoft Word 5.x (UNIX)

An archive of word versions, from windows through Mac and ms-dos to unix versions. Very very interesting.

Grammar check (and more) in LaTeX documents

Yesterday was a great day for the vim universe, especially if you write as many LaTeX documents as I do: a new version of vimTeX was released.
This gave me a reason to dive into its documentation once again and so I found a feature that I didn't even know existed: a grammar check in LaTeX.

How it works

To use this, you will need to download the following software:

Language Tool

For some years, there has been an amazing free software project to check natural languages: Language Tool (LT).
However, this works on plain text files and can't filter out the LaTex stuff.

YaLafi

Enter Yet Another LaTeX filter (YaLafi).
This tool extracts the plain text and performs the grammar check using LT, but it keeps track of the original position of the text!

VimTeX integration

YaLafi provides a vim compiler called vlty (I don't know where this name comes from) to use these tools and VimTeX makes configuration ridiculously easy: see :h vimtex-grammar-vlty.
Once everything is installed and you linked vimTeX to the directory containing the LT .jar file, you can populate and open the quickfixlist as follows:

:compiler vlty | make | copen

(It takes a few seconds.)

Issues

Now, obviously I just have been trying this out for a day, but so far I'm impressed to say the least.
However, in my opinion the YaLafi documentation is not always clear/easy to get started with:

how to handle multi-file documents
how to suppress certain warnings
how to handle user-defined macros
...

Getting to know both LT and YaLafi better will probably solve my problems.

Conclusion

Awesome software:

vim
vimTeX
YaLafi
Language Tool

I hope some of you can integrate this into your workflow and maybe this post inspires you to make some PR to YaLafi or to write tutorials for less technical users like me to use the advanced options.

reddit permalink

[D] stupid question: why doesn't deep learning always outperform classical machine learning and statistical models?

A colleague was saying the other day, that deep learning models are like sports cars - they need a minimum distance to accelerate before they can reach their top speed. The same way, deep learning models don't perform well on smaller data sets where there is no room (i.e. not enough data) to rev their engine. Thats why a mountain bike (e.g. CART decision tree) can navigate a trail in a forest compared to a Ferrari (e.g. convoloution neural network).

I really liked my colleagues analogy, but is there any math theory to support what they are saying? Are complex models (e.g. neural networks, svm) naturally (through their mathematical architecture) more susceptible to overfitting than a logistic regression or decision tree when exposed to smaller data? I feel there is an unspoken rule: "in general, use complicated models on complicated data". But is there any mathematical justification to support this?

I understand that sometimes deep learning models perform poorly because the analyst might not know how to use them properly (e.g. hyperparameter tuning) - but this doesn't reflect the model itself.

I know there is a theorem called the "no free lunch theorem" that shows by default, "there is no single best algorithm for all problem" - but can this theorem be used to somehow justify that smaller datasets don't require conplex models? I.e. is there some way to show that more complex models (e.g. suppose we quantify model complexity through the VC dimensionality) dont necessarily produce lower generalization error on smaller datasets?

So, given a very powerful computer that can simultaneously consider millions of hyperparameter combinations: can it be statistically shown that more complex models are not necessarily better for smaller data sets (e.g. iris data)?

Thanks

reddit permalink

GitHub - almarklein/timetagger: Tag your time, get the insight

Tag your time, get the insight. traggo alternative, pretty similar but not based on key:value pairs #tags instead.
More mature interface and reporting functionality, less extensive dashboarding possibilities.

productivity python

YBlog - Learn Vim Progressively

Simple and elegant progressive vim tutorial - starts out with very basic information and then becomes progressively more 'difficult', ending up with some basic macros and block visual selection tips.

Should be a good one for introducing new people to vim.

vim

The ecosystem of the Go programming language

reddit permalink

Find the commit that introduced a bug in your code: how to use git bisect in 7 steps – flavia bastos

git bisect usage:

git bisect start > puts you into bisect mode

git bisect bad <commitref|HEAD> > signal to git the commit where things definitely don't work right anymore (mostly HEAD)

git bisect good <commitref|HEAD> > signal to git the commit where things definitely did work right still (can be any number of commits back).

You will no be put into a special bisect inspection mode and git will checkout various commits in a divide and conquer manner for you to declare their status.

git bisect good|bad > After checking the state of the app/running tests/whatever, declare the commit as working or not-working. Git will then move you to the next commit until you find the right one introducing the bug.

git bisect reset > get out of bisect mode

git

regexone - learning regex through challenges

@danielhglus Does this count? https://regexone.com/ is a list of lessons where you go through and defeat challenges by making regex pass test cases. It starts really simple and goes pretty far. I don't think it does regex-but-not-regex stuff like negative lookahead and stuff though.

mastodon statuslink

List of JSON tools for command line

reddit permalink

100+ of the Best Free Data Sources For Your Next Project

A long list of data sources, divided by general topics, and of varying quality.

data_science list

How to Use Generators and yield in Python – Real Python

Another, more gentle and longer, introduction to the idea of python generators (yield)

python

How can I find out which users are in a group within Linux? - Unix & Linux Stack Exchange

Different ways of listing group members on linux. Often hinges on distinction between primary and additional group in linux.

Easiest way (if shadow package is installed) groupmems -g <group> -l.

linux

A Low Poly Art Tutorial in Python

Combine python, scipy and pygame to turn wallpapers into low poly art images and animations.

python design

Motivating yourself for side projects

Can I just say as a neuroscientist this is not your fault. Basically we think we have control over what we do but this is an illusion. For example you want to work on your project but you never do. So then you feel shame/guilt etc which only makes you more unproductive.

The solution to this is that the mind behaves more like a computer than we think. If you know how to properly interact with it you can make it do whatever you want. Now there is a long list of behavioural psychology focused on productivity but I will start you of with one thing.

Right now create a list it can be on your computer a website like trello.com or on paper it doesn't matter. On it write 6 Things that you can accomplish very quickly in relation to your project.

for example the list could be this.

make a project directory for my project.
download the dataset needed
install required tools for project
write first variable
write first function
Make the first graph
Set the commitment to do just one of these things per day, you don't have to do anymore.

Try adding new goals to your list as you complete old ones.

the goals should be easy to achieve 1 minute - 30 minutes for each.

Pretty soon you will be doing more than just one task.

This method efficiently uses your brains reward system. Doing small clearly defined tasks with low commitment is easy and generally fun to do.

Doing a large complicated project with no clear approach is not fun to do.

There are tonnes of efficiency hacks and every person is different. Good luck.

reddit permalink

life productivity

notcurses - character graphics and TUI library comes with some handy cli apps

Notcurses is a character graphics and TUI library; actually a promissing ncurses alternative. It was mentioned (1, 2) already in this subreddit.

Besides being a library it comes with some handy apps which you may find interesting:

ncls: an ls that displays multimedia in the terminal
ncneofetch: a neofetch ripoff
ncplayer: renders visual media (images/videos)
nctetris: a tetris clone
notcurses-demo: some demonstration code
notcurses-input: decode and print keypresses

reddit permalink

GitHub - edwardtufte/tufte-css: Style your webpage like Edward Tufte’s handouts.

Style your webpage like Edward Tufte’s handouts.

Uses a variety of css rules (embedding stuff in e.g. article, figure, checkbox tags) to emulate Tufte's visual design,
with margin notes, side-notes, and image positioning close to its text.

Some implementations seem a bit awkward (the way sidenotes are declared, the iframe wrapper introduced) and the link-underline text-shadow hack seems pretty bad (especially on restyling pages dark-mode), but it seems nice for inspiration.

Example page here.

css

Socat – A utility for data transfer between two addresses (2018)

reddit permalink

Wolfram Data Repository: Computable Access to Curated Data

Huge collection of computable, curated data from demographics to language, science & math, politics, social media. Many formats: numerical, time series, image, audio, geospatial. Can be exported as simple csv, or worked with in python notebooks on the page.

data_science list

Free Data Sources - Social Science Data Sources & Statistical Methods - Research Guides at Eastern Michigan University

Research Guides: Social Science Data Sources & Statistical Methods: Free Data Sources

A collection of free sources of various kinds of data, on recommendation-basis from EMU.

data_science list

“10 Algorithms To Solve Before your Coding Interview” - great little repo, I’d say it’s a mix of leetcode easy with a few leetcode medium problems mixed in.

reddit permalink

Oooh, DeepL has added 13 European languages! Including Danish, Swedish, Hungarian, …

DeepL quality consistently outperforms Google Translate. It's also great because DeepL doesn't use English as common ground between non-English languages. https://www.deepl.com/blog/20210316.html

mastodon statuslink

gnu screen - tmux multiple instances of the same session - Super User

Attaching to the same tmux session multiple times - and using different windows and views into it.

commandline

Practical Go Lessons Book: 700+ pages, 41 chapters, 405+ drawings

reddit permalink

SpotifyPod RSFlightronics DIY

Build log of Spotifypod: Spotify in a 4th-gen iPod (2004) made using an RPi Zero reusing the original clickwheel. Schematics and 3D files are included!.

Really nice build and update of vintage iPod.

diy

happyDNS / happyDNS · GitLab

Self-hosted DNS routing and service, including both options for simple and painless routing and advanced options, e.g. REST api access.

Seems quite young still, however, and does not support a lot of domain name registrars yet.

hosting api

15 Grafana vis you probably didn’t know was possible with these 3 plugins | by CrashLaker | Medium

Grafana is an amazing visualization tool used mainly by IT teams to monitor their infrastructure. As it’s open-source there’s huge contribution from the community on both datasource and panel making…

data_science monitoring

GitHub - axkibe/lsyncd: Lsyncd (Live Syncing Daemon) synchronizes local directories with remote targets

lsync can act as a repeatable rsync replacement. Under the hood it utilizes rsync (though there is an advanced rsyncssh integration available which does not re-transfer files over rsync that already exist on the target machine).

Ideal for local-remote scenarios, where changes occur on one machine and should be replicated on another (e.g. mirroring project directory and code changes, automatically pushing them to remote development environment for compilation/testing/building)

DigitalOcean guide here

ssh hosting programming