Going through the hard drive. Opening a spreadsheet... http://diversions.lan/media/usb/Diversies/Exports/ Exports: come from different collections: America, Musical Instrument Museum; The materials come from the 4 curators that Constant is involved with. FIRST ROW OF SPREADSHEET: N° d_inventaire Nom de l_objet Titre FR Titre NL Titre EN Classification Auteurs/Cultures Depuis À Période Dimensions Matériaux/Techniques Réf_ Géo_ Image standard there's no english in my americas file. The procedure: we look for different versions for the same item in different locations. Correlate the item. The first picture in the amtericas collection. http://diversions.lan/media/usb/Diversies/Images/America/AAM%2000030.12.jpg http://diversions.lan/media/usb/Diversies/Exports/Export%20amerique.xlsx#row=35 AAM 00030 Pointe (harpon) (Nom de l'objet->Outil et équipement->Composant d'outil et d'équipement->Composant d'équipement de pêche) Eskimo Os Lieu de production: Alaska (état) (Référence Géographique->Amérique->Amérique du Nord->États-Unis) M:\Images standard\135379.jpg Mia made the link - Well, not the real one. http://www.carmentis.be/ searching by name gives another result then above, searching by id number doesn't give a result for some of the objects that haven't been introduced into the MuseumPlus database yet Struggling to find the item in carmentis.... What to search for? Interestingly Inventory number isn't found. Advanced search: http://carmentis.be/eMuseumPlus?service=RedirectService&sp=Scollection&sp=SfieldValue&sp=0&sp=6&sp=2&sp=SdetailList&sp=0&sp=Sdetail&sp=0&sp=F For everything in Americas - use the corresponding dropdown item in advanced search field collection http://carmentis.be/eMuseumPlus?service=RedirectService&sp=Scollection&sp=SfieldValue&sp=0&sp=8&sp=3&sp=Slightbox_3x4&sp=0&sp=Sdetail&sp=0&sp=F returns 1884 items See also search help: http://carmentis.be/eMuseumPlus?service=WebAsset&url=helpText/search_tips/SearchTips_en.html&contentType=text/html This page lists functionalities drop down menus, drop down menus with use 'thesauri' (meaning?) These are the fields that the website search facility offers: Classification: Collection: Creator: Culture: Depository:Material/Technique: Object name: Objects with an image: Owner: Place:References: Research projects: Year: from – to: ...we are looking at the website as an object, at different layers, structure, scale (zoom in/out for detail) - oh look! those appear in Michael's py script with selenium (see below). Screen scraping: comes from scratching. Based on the visual structure of the website: extract information from it. ! URLs expire Other kind of scraping. Selenium-project. Automated browsers. http://docs.seleniumhq.org/ Searching, by date via the website... If you look for items by date, the date seems to be the date of meta-data instead of datation of items >> for instance searching between 2000 and 2016, we found : http://www.carmentis.be/eMuseumPlus?service=RedirectService&sp=Scollection&sp=SfieldValue&sp=0&sp=1&sp=3&sp=Slightbox_3x4&sp=0&sp=Sdetail&sp=0&sp=F but this item was dated after 1600... http://www.carmentis.be/eMuseumPlus?service=direct/1/ResultLightboxView/result.t1.collection_lightbox.$TspTitleImageLink.link&sp=10&sp=Scollection&sp=SfieldValue&sp=0&sp=1&sp=3&sp=Slightbox_3x4&sp=0&sp=Sdetail&sp=0&sp=F&sp=T&sp=9 here it seems better : but between 1901 and 2000 http://www.carmentis.be/eMuseumPlus?service=direct/1/ResultLightboxView/result.t1.collection_lightbox.$TspTitleImageLink.link&sp=10&sp=Scollection&sp=SfieldValue&sp=0&sp=1&sp=3&sp=Slightbox_3x4&sp=0&sp=Sdetail&sp=0&sp=F&sp=T&sp=2 >>>> actually no it is worst : datation seems to be "period of time" because after 1600 = until now from "1978" to "1978" > found object1..."AAM 00039.69" date value of found object.1.."after 1600" ...i suppose this is correct! > found object2..."ETAM 02005.1.5" date value of found object2..."1950 / 2000" ...this is less correct? from "1950" to "2000" > found object3..."ETAM 00091.1.34" date value of found object3..."1901 / 2000" ...this is less correct? Searching for today.... script using selenium (http://www.seleniumhq.org/) to scrape data from the collection's website: given a file URL, the code will open the image, save the file locally and either display the metadata of the file or save it locally in a CSV file (through a pipeline) scraping with selenium has the advantage of repeating actions automatically and being very precise selecting elements by ID with selenium: *.find_element_by_css_selector("li," p) * This is the software MuseumPlus that the website is built on: http://www.zetcom.com/en/products/museumplus/ For comparison: http://www.dspace.org/ (favoured by libraries) possible tools to combine with: imagemagick (Swiss army knife for image tools) imagemagick is the package, there are other functions? eg imagemagick-montage command: montage -label %t pointe/*.jpg ML: An pointing out not to get overwhelmed by programming language. I think it can help to keep in mind that programs essentially automate what we might do manually. MAKE A POSTER OF A FOLDER OF IMAGES: montage -label %t pointe/*.jpg pointe.png Go on the website search facility and search for all items by using asterix (*) rtns 75708 items Find the script to scrape: https://gitlab.constantvzw.org/diversions you need to sign ip (on Constant server) and can create a zipfile to download (button at the right) Leaflet - often used for geograhpic applications openstreetmap: http://leafletjs.com/