Last active
February 7, 2023 10:41
-
-
Save boutros/d52ddaba27e7197cbc72d1f957756f4a to your computer and use it in GitHub Desktop.
Samtaler mellom kompjutere
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SIRKULATOR overordnet plan | |
1 metatada import/export roughly working, inc oai provder | |
2 job scheduler/runner, with som exampels: harvesting snl and wikidata descriptions for persons, reindexing, oai harvesting | |
3 items and circulation, incl borrowers/users/staff/bot agents | |
4 public facing website for catalog, search and explore, but no ordering yet | |
5 ncip/sip2 support | |
6 stresstest/integrationtest | |
* rename interne og eksterne beskrivelser - beskrivelser? tekster om? (+antall) | |
* slå sammen wikidata, wikipeda, snl, isbnforlag til enhetlig Publisher struct for import: ta en bokstav i ny og ne! ca 200-forlag? | |
* hindre at jobber låser tabeller (ex prøv harvest_nb_links og update_snl_descriptions samtidig med import av metadata) | |
* Unngå både has_contributor med Forlag/utgiver rolle og published_by relasjon ex: 978-82-691039-2-2 | |
* Unngå dobbel main_entry+contributor, ex: 82-05-17893-3 | |
* "Ok, saved" får ikke riktig språk. | |
* interne og eksterne beskrivelser (count) | |
* publisher utgivelser (count) + pagination | |
* Resource Events, bruke på utgiver (grunlagt, slått sammen, kjøpt opp, nedlagt etc) | |
* person: convert to character (ex apollon, in 978-82-419-1999-2) | |
* søk og koble komponent - forsøk med webcomponent | |
* hx-extension: hx-err-target + hx-swap-on-err (400/500 statuses) | |
* jobruns: filter select: on [name] [status] | |
* jobruns: pagination? | |
* hx-extension: run arbritary js function after hx-swap | |
* oai harvester: arhived_at not set, deleted upsert working? | |
* index tags (dewey terms, publication audience/genre etc) | |
* bluge analyzer tokenfilter strip '--' from dewey labels | |
* janitor job: if relation published_by.label not found in publisher.nameVariants - add it. | |
* save corporation | |
* save publication | |
* oversatt tittel: https://bibsys.alma.exlibrisgroup.com/view/sru/47BIBSYS_NETWORK?version=1.2&operation=searchRetrieve&recordSchema=marcxml&query=alma.isbn=9788203264450 | |
* 653 emner | |
* vocab/gender/gender.go -> func Options(lang) [2]string | |
* person og corporation page samme, bortesett fra PersonFrom vs CorporationForm, og ResourceType da | |
=> agent_page? | |
Next: | |
* slette identifikator, legg til identifikator | |
undo? | |
CREATE TABLE undo ( | |
id INTEGER PRIMARY KEY, | |
q TEXT NOT NULL, -- ex: "INSERT IGNORE INTO link (from_id, type, id) VALUES ('a','b','c')" | |
at INTEGER NOT NULL | |
) | |
* Søk og koble "utgitt av" | |
Not sortable if length of table is 1 | |
Connect resource dialog: | |
https://blog.benoitblanchon.fr/django-htmx-modal-form/ | |
CREATE TABLE resource_event ( | |
resource_id TEXT, | |
at INTEGER NOT NULL, -- timestamp, year, date, | |
type TEXT NOT NULL, -- what happened: prize won, birthdate etc | |
data JSON NOT NULL DEFAULT '{}', | |
PRIMARY KEY (resource_id, at) | |
) | |
type ResourceEvent struct { | |
ResourceID string | |
At time.Time | |
Type string // merged_with, replaced_by, prize_nominee, prize_win | |
Data map[string]interface{} | |
} | |
type Resource struct { | |
Timeline []ResourceEvent | |
} | |
relations: merged_with (another publisher) | |
NEXT | |
* hindre dobbel forfatter-relasjon til utgivelse. droppe main-entry? | |
ex: https://bibsys.alma.exlibrisgroup.com/view/sru/47BIBSYS_NETWORK?version=1.2&operation=searchRetrieve&recordSchema=marcxml&query=alma.isbn=9788202722319 | |
* search pagination | |
* store indexed_at timstamp | |
* parse identifiers (ean/isbn/issn) | |
* etl: personer uten ID, lag review? ex ISBN 8205267995 | |
* import: bug med "already in catalog" ved flere isbn nr ISBN 9788380043220 ISBN 9788380043466 | |
* parse 245c to resource/relation | |
want = []sirkulator.Relation{ | |
{ | |
Type: "contributes_to", | |
Data: map[string]interface{}{role:"aut", "name": "Torbjørn Egner"}, | |
}, | |
{ | |
Type: "contributes_to", | |
Data: map[string]interface{}{role:"trl", "name": "Lars Fiske", "note": "Nynorsk"}, | |
}, | |
} | |
et al. = m.fl. | |
// DownloadImage will try to download image from the given urls, | |
// stopping as soon as one image is sucessfully stored. It returns | |
// the url along with the image. | |
// * urls is assumed to be sorted according to priority. | |
func DownloadImage(urls []string) ([]byte, string, error) { | |
var id string | |
for _, url := range urls { | |
b := ioutil.ReadAll(r.Body) | |
if http.DetectContentType(b) != "image/jpeg" { | |
continue | |
} | |
return b, url, nil | |
} | |
return id, nil | |
} | |
package http | |
func Download(url) ([]byte, error) | |
func DownloadTo(url, w io.Writer) error | |
package etl | |
type WebTarget struct{ | |
Name string | |
URL string | |
Ingestion func(body io.ReadCloser) (Ingestion, error) | |
} | |
type SPARQLTarget struct { | |
Name string | |
URL string | |
Query string | |
Ingestion func(graph rdf.Graph) (Ingestion, error) | |
} | |
{ | |
Name: "britishlibrary", | |
URL: "https://bnb.data.bl.uk/sparql", | |
Query: ` | |
PREFIX bibo: <http://purl.org/ontology/bibo/> | |
DESCRIBE ?book WHERE { ?book bibo:isbn13 "978-0-593-13677-5" . } | |
`, | |
Ingstion: func | |
} | |
var httpLookupsByID = map[string][]Target{ | |
"isbn": []Target{ | |
{ | |
Name: "bibsys/sru", | |
URL: "https://bibsys.alma.exlibrisgroup.com/view/sru/47BIBSYS_NETWORK?version=1.2&operation=searchRetrieve&recordSchema=marcxml&query=alma.isbn=8273504166" | |
}, | |
{ | |
Name: "open_library", | |
URL:"https://openlibrary.org/isbn/%s.json" | |
}, | |
{ | |
Name: "worldcat", | |
URL: "https://www.worldcat.org/search?q=isbn%3A%s" | |
} | |
{ | |
Name: "gcd" | |
URL: "https://www.comics.org/isbn/%s/ | |
}, | |
} | |
"issn": | |
"viaf": | |
} | |
LATER | |
- update existing by isbn? go fetch data and see whats new/different | |
- enrich resource jobs: getting description from wikidata, snl | |
} | |
green-background: #ccffd8 | |
green-background-em: #abf2bc | |
red-background: #ffebe9 | |
red-background-em: #fe8282 | |
https://bibsys.alma.exlibrisgroup.com/view/oai/47BIBSYS_NETWORK/request?verb=GetRecord&identifier=oai:urm_publish:999921380896302201&metadataPrefix=marc21 | |
https://www.niso.org/schemas/ncip/v2_02/ncip_v2_02.xsd | |
SELECT 'R' || lower(hex(randomblob(4))) || strftime('%s','now'); | |
https://git.sr.ht/~mariusor/wrapper/tree/master/item/examples/main.go | |
https://github.com/thanos-io/thanos/blob/main/docs/contributing/coding-style-guide.md | |
Beviste valg og prioriteringer: | |
ekspertsystemet (intra) er | |
* ikke mobiltilpasset, krever vanlig stor skjerm | |
* støtter kun nye nettlesere (testes i chrome og firefox) | |
frontend er: | |
* mobiltilpasset | |
* støtter "alle" nettlesere | |
GO HYGIENEFAKTORER | |
https://dave.cheney.net/practical-go/presentations/gophercon-israel.html | |
===== | |
https://explained-from-first-principles.com/email/ | |
DEB PACKAGING / SYSTEMD | |
====================== | |
https://blog.knoldus.com/create-a-debian-package-using-dpkg-deb-tool/ | |
https://mgdm.net/weblog/systemd/ | |
https://old.reddit.com/r/golang/comments/rcebag/zero_downtime_restarts_and_deploys_using_systemd/ | |
Backup | |
====== | |
https://news.ycombinator.com/item?id=29209455 | |
https://archive.md/1jHmP#selection-381.0-385.28 | |
METRIKK | |
======== | |
Måle de riktige og viktige tingene? | |
* Surrogate variable that is measured: the thing we measure because it is measurable | |
* Variable of true or greater interest: the thing we actually want to know about | |
* Measurement technique of surrogate variable: whether we have the ability to get the actual direct value, or whether it is rather inferred from other observations | |
* Artefactual influences: what are the things that can mess up the data in measuring it | |
* Certainty of "Normal Range": how sure are we that the value we read is representative of what we care about? | |
The actual value is not in the metric nor the alert, but in the reaction that follows. They're a great trigger point for more meaningful things to happen, and maintaining that meaningfulness should be the priority. | |
https://ferd.ca/plato-s-dashboards.html | |
DATAKILDER | |
============= | |
https://data.norge.no/datasets/cdbe6acc-573f-48bc-9808-46bf538fcf30 | |
https://bibliotekutvikling.no/kunnskapsorganisering/ | |
https://www.oclc.org/content/dam/research/publications/2020/oclcresearch-transitioning-next-generation-metadata-a4.pdf | |
https://ns.editeur.org/thema/nb | |
https://www.editeur.org/files/Thema/1.4/Thema_v1.4_nb/Thema_v1.4.2_nb.html | |
Musikk: | |
Nordisk litteratur | |
http://runeberg.org/authors/ | |
http://runeberg.org/search.pl?born=1949 | |
Tegneserier: | |
http://www.minetegneserier.no/pls/htmldb/f?p=100:3:22511890183313::NO::P3_SERIER_ID:1339&cs=11AF7B0B0CF11D08FBA345BA65068C848 | |
https://www.comics.org/publisher/1609/ | |
https://beta.comics.org/series/49089/covers/ | |
Historie | |
https://www.norgeshistorie.no/ | |
Lokalisering språk norsk+engelsk | |
https://phrase.com/blog/posts/internationalization-i18n-go/ | |
https://angelika.me/2021/11/23/7-gettext-lessons-after-2-years/ | |
DEV env live reload | |
https://news.ycombinator.com/item?id=28015798 | |
NB LISTE OVER INTEGRASJONER FOR NORSKE BIBLIOTEK | |
https://bibliotekutvikling.no/kunnskapsorganisering/krav-til-biblioteksystem-integrering-mot-nasjonale-tjenester/ | |
DRIFT HOSTING | |
============= | |
https://specbranch.com/posts/one-big-server/ | |
https://news.ycombinator.com/item?id=29471986 | |
SIKKERHET | |
======== | |
https://github.com/FiloSottile/age | |
Stride trusselmodellering | |
https://en.m.wikipedia.org/wiki/STRIDE_(security) | |
https://lobste.rs/s/onn8vc/simple_things_are_actually_hard_user | |
Autentisering: https://news.ycombinator.com/item?id=29761728 | |
https://www.youtube.com/watch?v=10Qj0eYqbuo | |
PRIVACY / ANONYMISERING | |
======================= | |
https://www.youtube.com/watch?v=RNykMU7wF7s | |
Ontologi metadata bibliotek rdf | |
======================== | |
https://news.ycombinator.com/item?id=28710081 | |
http://digitalcuration.umaine.edu/resources/shirky_ontology_is_overrated.pdf | |
https://news.ycombinator.com/item?id=29141800 | |
SQL database modellering | |
===================== | |
Never delete rows referenced in other tables, use ON DELETE RESTRICT | |
All timestamp are stored as UTC | |
All application log timestamps are also UTC | |
slug timestamp IDer: https://news.ycombinator.com/item?id=34436625 | |
Recursion: | |
https://news.ycombinator.com/item?id=28018058 | |
https://nessuent.xyz/posts/2021-07-18_detecting_cycles.html | |
Lagre binære data (bilder) i blob eller filsystem? | |
https://news.ycombinator.com/item?id=14550060 | |
window functions | |
https://medium.com/analytics-and-data/the-versatility-of-row-number-one-of-sqls-greatest-functions-53ec78e74096 | |
https://learnsql.com/blog/get-to-know-the-power-of-sql-recursive-queries/ | |
https://www.startdataengineering.com/post/6-concepts-to-clearly-understand-window-functions/ | |
sql pro con | |
https://news.ycombinator.com/item?id=27791539 | |
actual time vs record time | |
https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/ | |
triggers o.l | |
https://brandur.org/fragments/code-database-vs-app | |
SQLITE | |
https://news.ycombinator.com/item?id=34346411 | |
https://epilys.github.io/bibliothecula/notekeeping.html | |
https://zeroclarkthirty.com/2022-05-21-json-diffing-with-sqlite | |
https://lobste.rs/s/ts0vtk/sqlite_is_not_toy_database | |
https://news.ycombinator.com/item?id=26580614 | |
https://news.ycombinator.com/item?id=26217754 | |
https://antonz.org/sqlite-3-35/ | |
https://news.ycombinator.com/item?id=26103776 | |
https://sqlite.org/forum/info/dfd4739c57e02eea | |
https://dgl.cx/2020/06/sqlite-json-support | |
https://www.sqlite.org/sqlanalyze.html | |
https://old.reddit.com/r/sqlite/comments/ocsahk/db_encryption/ | |
https://jcuenod.github.io/bibletech/2021/07/26/full-text-search-for-pdfs/ | |
https://news.ycombinator.com/item?id=28050198 | |
https://news.ycombinator.com/item?id=28259104 | |
https://news.ycombinator.com/item?id=29727707 | |
I prod | |
We've been using SQLite as our principal data store for 6 years. Our application services potentially hundreds of simultaneous users at once, each pushing 1-15 megabytes of business state to/from disk 1-2 times per second. | |
We have not had a single incident involving performance or data integrity issues throughout this time. The trick to this success is as follows: | |
- Use a single SqliteConnection instance per physical database file and share it responsibly within your application. I have seen some incorrect comments in this thread already regarding the best way to extract performance from SQLite using multiple connections. SQLite (by default for most distributions) is built with serialized mode enabled, so it would be very counterproductive to throw a Parallel.ForEach against one of these. | |
- Use WAL. Make sure you copy all 3 files if you are grabbing a snapshot of a running system, or moving databases around after an unclean shutdown. | |
- Batch operations if feasible. Leverage application-level primitives for this. Investigate techniques like LMAX Disruptor and other fancy ring-buffer-like abstractions if you are worried about millions of things per second on a single machine. You can insert many orders of magnitude faster if you have an array of contiguous items you want to put to disk. | |
- Just snapshot the whole VM if you need a backup. This is dead simple. We've never had a snapshot that wouldn't restore to a perfectly-functional application, and we test it all the time. This is a huge advantage of going all-in with SQLite. One app, one machine, one snapshot, etc... | |
DATAMODELLERING | |
https://minimalmodeling.substack.com/archive?sort=new | |
https://rtpg.co/2021/06/07/changes-checklist.html | |
https://news.ycombinator.com/item?id=27482243 | |
https://lobste.rs/s/x0fk0a/simple_graph_graph_database_sqlite | |
https://johnnydecimal.com/ | |
"status"-felt bør kunne representeres med en finite-state-machine, ellers så er det bedre å modellere med flere flags | |
Tags, stikkord | |
https://news.ycombinator.com/item?id=33248391 | |
https://twitter-thread.com/t/1534301374166474752 | |
SØK INDEKSERING | |
https://spinscale.de/posts/2020-10-20-search-engines-and-libraries-overview.html | |
https://news.ycombinator.com/item?id=28187675 | |
https://scribe.rip/p/what-every-software-engineer-should-know-about-search-27d1df99f80d | |
SCHEDULING / CRON | |
https://trstringer.com/systemd-timer-vs-cronjob/ | |
https://wiki.archlinux.org/title/Systemd/Timers | |
https://github.com/alash3al/exeq/blob/main/internals/queue/job.go | |
https://www.fullstory.com/blog/why-errgroup-withcontext-in-golang-server-handlers/ | |
PROGRAM STRUKTUR | |
http helpers: handler returning error STEAL THIS! | |
https://vladimir.varank.in/notes/2021/03/little-things-of-go-http-handlers/ | |
discussion: https://old.reddit.com/r/golang/comments/mhf04c/little_things_of_go_http_handlers/ | |
https://eli.thegreenplace.net/2021/a-comprehensive-guide-to-go-generate/ | |
https://www.simplethread.com/20-things-ive-learned-in-my-20-years-as-a-software-engineer/ | |
https://www.gobeyond.dev/ | |
https://goatspeed.substack.com/p/putting-context-into-context | |
https://www.ardanlabs.com/blog/2019/09/context-package-semantics-in-go.html | |
https://millhouse.dev/posts/graceful-shutdowns-in-golang-with-signal-notify-context | |
https://www.youtube.com/watch?v=ZdXDjYsH83M&list=PLtoVuM73AmsIQv2wba8Hpl424XmWQZu5E&index=33 | |
https://developer20.com/http-connection-livetime/ | |
https://www.joeshaw.org/error-handling-in-go-http-applications/ | |
https://ketansingh.me/posts/golang-x-sync/ | |
https://lobste.rs/s/vzdoor/cult_go_test | |
https://pkg.go.dev/gotest.tools/v3/assert | |
https://freshman.tech/linting-golang/ | |
https://blog.carlmjohnson.net/post/2021/how-to-use-go-embed/ | |
https://github.com/benbjohnson/hashfs | |
CONFIG: https://bitfieldconsulting.com/golang/cuelang-exciting | |
ERRORS: | |
https://peter.bourgon.org/blog/2019/09/11/programming-with-errors.html | |
https://blog.carlmjohnson.net/post/2020/working-with-errors-as/ | |
https://github.com/valyala/quicktemplate | |
eller | |
https://github.com/benbjohnson/ego | |
SIKKERHET | |
https://news.ycombinator.com/item?id=30514560 | |
https://purelymail.com/docs/security | |
https://news.ycombinator.com/item?id=26851037 | |
https://mvsp.dev/mvsp.en/index.html | |
https://news.ycombinator.com/item?id=30499618 | |
LOGGING | |
https://blog.kowalczyk.info/article/fc9203f7c72a4532b1ae51d018fef7b3/trade-offs-in-designing-versatile-log-format.html | |
https://presstige.io/p/Logging-HTTP-requests-in-Go-233de7fe59a747078b35b82a1b035d36 | |
ANALYTICS METRIKK RAPPORTER STATISTIKK | |
https://www.robinlinacre.com/parquet_api/ | |
https://www.robinlinacre.com/demystifying_arrow/ | |
https://deepnote.com/@abid/Data-Science-with-DuckDB-9KKvj1EoQrmj6nj4Y2prkg# | |
parquet & duckdb! | |
https://news.ycombinator.com/item?id=31355050 | |
https://news.ycombinator.com/item?id=29966238 | |
https://old.reddit.com/r/Python/comments/y8tu99/analyzing_46_million_mentions_of_climate_change/ | |
TESTING | |
https://www.clinicallyawesome.com/2021/10/go-reference-mutation-testing.html | |
https://earthly.dev/blog/property-based-testing/ | |
LOKALISERING / 18LN | |
https://www.alexedwards.net/blog/i18n-managing-translations | |
WEB INSPIRASONER | |
https://apps.npr.org/best-books/index.html | |
http://blog.apps.npr.org/2019/12/03/book-concierge.html | |
https://shepherd.com | |
https://fivebooks.com/ | |
https://hackernewsbooks.com/ | |
https://muan.co/ | |
https://mxb.dev/blog/container-queries-web-components/ | |
HTML / CSS | |
https://tdarb.org/blog/notice-box.html | |
https://tdarb.org/blog/craigslist-gallery.html | |
https://elisehe.in/2022/10/16/attribute-selectors | |
https://news.ycombinator.com/item?id=32972004 | |
https://news.ycombinator.com/item?id=30512512 | |
https://ishadeed.com/article/defensive-css/ | |
https://www.joshwcomeau.com/css/custom-css-reset/ | |
https://1linelayouts.glitch.me/ | |
https://web.dev/patterns/layout/ | |
https://www.matuzo.at/blog/html-boilerplate/ | |
https://markodenic.com/css-tips/ | |
https://markodenic.com/html-tips/ | |
https://jdan.github.io/98.css/?ref=hn | |
https://docs.google.com/presentation/d/1hvnPpsJo44BTPfJx28CV95vqk_dt6na1awUbk0kmZYM/edit#slide=id.g3e31444916_1_13 | |
https://emoji.muan.co/# | |
https://dohliam.github.io/dropin-minimal-css/ | |
https://news.ycombinator.com/item?id=27388691 | |
https://riggraz.dev/no-style-please/ | |
https://engineering.kablamo.com.au/posts/2021/my-first-css | |
https://news.ycombinator.com/item?id=28116888 | |
https://www.joshwcomeau.com/tutorials/css/ | |
https://www.smashingmagazine.com/2021/07/hsl-colors-css/ | |
støtte reader mode: https://www.ctrl.blog/entry/browser-reading-mode-metadata.html | |
https://www.joshwcomeau.com/css/designing-shadows/ | |
https://www.gwern.net/Sidenotes | |
https://7guis.bradwoods.io/flight-booker/ | |
https://github.com/argyleink/gui-challenges | |
tabeller: https://alistapart.com/article/web-typography-tables/ | |
UI / UX | |
https://news.ycombinator.com/item?id=31502193 | |
https://open-ui.org/components/datepicker.research | |
https://www.bbc.co.uk/gel/guidelines/how-to-write-useful-error-messages | |
ETL (postgress) | |
Dr. Martin Loetzsch did a great video, ETL Patterns with Postgres. He covers some really good topics: | |
- Instead of updating tables build their replacements under a different name then rename them. This makes updating heavy-to-compute table instant. Works even for schemas: rebuild a schema as schemaname_next rename the current to schemaname_old then rename schemaname_next to schemaname. | |
- Keep all the source data raw and disable WAL, you don't need it for ETL. | |
- Set memory limitis high. | |
And lots of other good tips for doing ETL/DW in postgres. It's here: https://www.youtube.com/watch?v=whwNi21jAm4 | |
I really appreciate having data in postgres. It's often easy to think that a specialised DW tool will solve all your problems, but that often fails to consider things like: | |
- Developer experience. Postgres runs very easily on a local machine, more specialized solutions often don't or are tricky to setup. | |
- Learning another tool costs time. A developer can learn postgres really well in the time it takes them to figure out how to use several more specialised tools. And many devs already know postgres because it's pretty much the default DB nowadays. | |
- Analytics queries often don't need to run at warp speed. Bigquery might give you the answer in a second but if postgres does it in a minute and it's a weekly report, who cares? | |
- Postgres is boring and has been around for many years now, it will probably still be here in 10 years so time spent learning it is time well spent. More niche systems will probably be superseded by fancier, faster replacements. | |
I would go so far as to say don't necessarily need to split out your DW from your prod DB in every case. As soon as you start splitting out a DW to a separate server you need some way to keep it in sync, so you'll probably end up duplicating some business logic for a report, maintaining some ingestion app, shuffling data around S3 or whatever. Keeping your analytics in your prod DB (or just a snapshot of yesterdays DB) is often good enough and means you will be more likely to avoid gnarly business-rules going out of sync between your app and your DW. | |
https://github.com/mara/mara-pipelines | |
Worker pools vs semaphore | |
========================== | |
For those who want a bit more context to this discussion - it's basically the difference between having N goroutines running all the time (worker pool), waiting for work to come in from outside requests, or having every outside request start a new goroutine, but limit that with a threadsafe count of no more than N at once (semaphore). | |
The nice thing about semaphores vs. a worker pool is that with a semaphore, you can make it look more like normal old "start a goroutine" code, using a specific function. (e.g. go specificTask(a, b, c)), rather than using a worker that has to be generic and passing it some kind of function to run (e.g. workerCh <- func() { specificTask(a, b, c) }) | |
https://old.reddit.com/r/golang/comments/pzwppr/worker_pool_vs_semaphore/ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment