Summary of DCDC22 | Developing computational skills for digital collections: a new Programming Historian series

This is an AI generated summary. There may be inaccuracies.
Summarize another video · Purchase summarize.tech Premium

00:00:00 - 01:00:00

The video discusses the need for computational skills for digital collections and provides examples of how these skills can be used to improve the accessibility and readability of scholarly outputs. It also provides a brief overview of the technology behind the Shiny application and discusses some of the challenges of creating tutorials with this type of content.

  • 00:00:00 The project developed two tutorials on best practices for interrogating digital collections, one specifically focused on large-scale unborn digital content. The first of these tutorials will be published soon and there are seven in total. The aims of the project are to help address a skills deficit among researchers in the analysis and use of digital collections, and to help archives increase their digital preservation capacity.
  • 00:05:00 The video discusses how programming historian, DCDC22, seeks to help digital collections professionals by providing quality resources. DCDC22 stands for "Developing computational skills for digital collections: a new Programming Historian series." The video introduces the project's creators, Peter Marchionne and Joe Paula, and explains their reasons for wanting to create the platform. The creators state that they are confident that the resources produced by DCDC22 will be of high quality and useful to researchers.
  • 00:10:00 This tutorial presents three case studies of automated document classification using word embeddings. Jenny Williams from the British Library discusses how her team is using natural language processing to identify communities of early innovative activity. John Reed from the Center for Advanced Spatial Analysis at University College London shares his project's results, which show that researchers are more likely to mention specific technologies when they write about their work. Finally, Peter Webster from Programming Historians presents the final case study of a historian using machine learning to identify the authors of historical papers.
  • 00:15:00 This 1-hour tutorial, "DCDC22 | Developing computational skills for digital collections: a new Programming Historian series", provides a high-level overview of how natural language processing and word embeddings can be used to engage with archives more effectively. The tutorial also discusses the ddc, a machine-learned expert label, and how it was used to select four disciplines for study using the data set.
  • 00:20:00 The video discusses how computational skills are important for digital collections, and how word embeddings are used to preserve contextual relationships. It then shows a process of reducing the dimensionality of a vector, clustering documents, and measuring the accuracy of the process. Finally, the video demonstrates how word clouds can be used to display the results of the algorithms.
  • 00:25:00 Shahan Vidal Goren is a doctoral student and president and founder of KAlpha, which specializes in text detection and automated analysis technologies for manuscripts in oriental languages. He will present a lesson on "AI for automated transcription of historical documents," focusing on under-resourced collections.
  • 00:30:00 The video discusses the problem of under-resourcing digital collections, and how author recognition and ashtar work can help to solve this problem. Author recognition is a deep learning approach that can be used to identify text in a document. Ashtar works by identifying regions of interest in a document, and then creating models to recognize text within those regions. These models can then be used to transcribe or search for text in a document. While author recognition is a very effective approach, it is often not feasible or accurate when dealing with under-resourced cases. This is because it requires a significant amount of data to be accurate, and the state-of-the-art machine learning architectures are designed for Latin scripts. When working with scripts that are not in Latin, such as Arabic or Greek, author recognition can be less accurate. Additionally, it is often necessary to define specific specifications for the data that will be used in the recognition process. Finally, creating data that is relevant to the recognition process is often difficult. This is because it is often necessary to know about the specific purposes for which the recognition will be used, as well as the specifications for the text that will be recognized.
  • 00:35:00 The digital chronicler discusses the challenges of transcribing manuscripts with poor quality printing, and how to overcome them with specialized recognition models. The platform used in the lesson, Khalsa Vision, includes real-time AI fighting to help with transcription.
  • 00:40:00 This video series introduces the concept of computational skills for digital collections and provides examples of how these skills can be used to improve the accessibility and readability of scholarly outputs. It also provides a brief overview of the technology behind the Shiny application and discusses some of the challenges of creating tutorials with this type of content.
  • 00:45:00 This YouTube video introduces the shiny package and its usefulness for creating interactive applications. The user interface of a shiny application is composed of visual inputs and outputs. The user also needs to be familiar with the programming language behind shiny, since it is not possible to create sophisticated applications without knowing it.
  • 00:50:00 In this video, the principles of reactive programming are introduced, and the basics of a shiny user interface are explained. The tutorial then demonstrates how to create an interactive map using a data set of places and their geographic coordinates.
  • 00:55:00 The presenter discusses the pros and cons of learning different programming languages, focusing on R and Python. He says that whichever one is best for a user is a matter of personal preference. He also mentions that R can be used for machine learning and AI tasks, while Python is more popular. Finally, he says that R and Python can be used together.

01:00:00 - 01:25:00

The video discusses the use of computational skills for digital collections, focusing on the importance of data accessibility and sustainability. It discusses the potential issues with using these skills in a catalog search context, and offers suggestions for improving the user experience.

  • 01:00:00 The video discusses the use of computational skills for digital collections, focusing on the importance of data accessibility and sustainability. It discusses the potential issues with using these skills in a catalog search context, and offers suggestions for improving the user experience.
  • 01:05:00 In this video, Jenny John and John do a quick overview of word embeddings and how they can be used to analyze digital collections. John then asks Jenny a question about converting a quote into a question. Jenny explains that the quote-unquote misclassification is a way to expand on what Jenny was saying earlier about how different textual classifications can exist for the same document.
  • 01:10:00 In this video, digital curation experts discuss the emerging field of computational skills for digital collections, which focuses on developing programming skills to help with the structure and retrieval of documents. One panelist notes that, while the field of digital curation is still evolving, there are ways to improve the accuracy of the transcript. Another panelist introduces James Baker, a member of the core programming historian team, who will discuss some of the technological advancements in the field.
  • 01:15:00 The presentation by James Baker reflects on the implications of the project "Developing computational skills for working with large-scale collections" as it moves closer to publication. The project has had a broad impact, including affecting 1.5 million readers per year. Baker discusses the impact of the project on his own thinking about programming historians and their model of work.
  • 01:20:00 The video series "DCDC22: Developing computational skills for digital collections: a new Programming Historian series" starts by discussing the challenges of writing about large data sets, which is a topic Isabelle brought up. The first article in the series discusses sustainability, and the author reflects on how they managed to retain some of that in their work despite being burned before. The second article discusses how the author feels about producing articles simultaneously, and how it emphasizes the importance of writing for global audiences. The third article discusses the importance of producing articles simultaneously and how it shows the effort and time required to produce quality work.
  • 01:25:00 The YouTube video "DCDC22 | Developing computational skills for digital collections: a new Programming Historian series" discusses how programming historians are working to develop computational skills for digital collections. The series includes articles in other languages, and one has already been localized during translation. The project is complex, and involves involvement from a wide team. Thank you to everyone who has participated in making it happen.

Copyright © 2024 Summarize, LLC. All rights reserved. · Terms of Service · Privacy Policy · As an Amazon Associate, summarize.tech earns from qualifying purchases.