Thoughts on Word2Vec AI for information retrieval applications

Study 7 min.
It’s been a bit more than one year that I have put Chantal AI online, with now 3 major iterations of design. It’s time to compile what I learned from that. Introduction : information retrieval So you have documents and pages containing information and knowledge (HTML, plain text and PDF) useful to your business. That’s better than not having documents. Problem is, as the number of documents increases, information gets more and more buried… The best place to hide a tree is in a forest.
Lire la suite →

Designing an AI search engine from scratch in the 2020's

Study 19 min.
This is a follow-up on the previous Websites suck, which covered the preliminary information retrieval step. Introduction On the open-source planet, in the 2020’s, information is scattered over many websites : scientific journals for theory, specification sheets for standards and protocols, software documentation for “how to use tool”, blogs and Youtube tutorials for “how to achieve goal”, forums and support for “how to solve problems”, Github for “what is known to break” and “why design (or lack thereof) was done this way”, sourcecode for implementation details, and books for everything considered worthy of paiement for access.
Lire la suite →

Websites suck.

Opinion 6 min.
I have spent the past month working on an AI-based search engine. When you go on darktable sub-Reddit, you will find the question “why do lighttable’s thumbnails look different from darkroom preview” asked every next week. The question is answered many times on this sub-Reddit, on various forums, and I even put the answer on the main Readme file, displayed on the Github main page of the software. To no avail.
Lire la suite →

Open source and professional photography : lies and wishes

Opinion 8 min.
There is one thing you will find on the home page of pretty much any open source (call it libre or free if you will, those lines are blurred) image editing software : the promise that it is, somehow, suitable for professionals. Marketing has abused that word for decades, it is only natural that it should affect non-commercial and non-profit projects as well, just to try to buy some cheap credibility.
Lire la suite →

Image processing does not kill people… and it's a shame

25 min.
Among the technical fields, quite a few have the potential to harm the public : the first that come to mind are medicine and civil engineering. Both have in common their scientific basis : studies, data, models and history form a corpus of knowledge and tools used by the practitioners to help making choices. However scientific their basis is, the practice remains an art or a craft. Indeed, while the state of the art provides models, data and methods, it is the practitioner’s responsibility to identify which model applies to the current circumstances, which tools are the best suited to the current situation, and which are the priorities that will make the best solution stand out of the reasonable ones.
Lire la suite →

Bilinear interpolation on images stored as Python Numpy ndarray

8 min.
If you are working in image processing and using Python as a prototyping script language to test algorithms, you might have noticed that all the libs providing fast image interpolation methods (to either sub-sample or over-sample) work in 8 bits unsigned integers (uint8). This is quite annoying if you are working with floating point images. PIL supports floating point interpolation , but only for one layer, thus forget about RGB, and scipy.
Lire la suite →

Make Jupyter Notebooks easy to blog in WordPress

13 min.
I have struggled with most solutions to convert and embed Jupyter notebooks into WordPress blog posts since I use Plotly as a graphic lib, as well as many LaTeX equations and images. Finally, I had to code my way through. Here is what I did : Write the jupyter notebook Nothing that you don’t know here. If you embed pictures in the notebook though, it would be good to upload them on your WordPress media library, then use the external URL to include them (from your server) in the Markdown cells of Jupyter.
Lire la suite →