The program is developed based on selected proposals submitted by the community. All proposal abstracts are peer reviewed by the planning committee under a single-blind review protocol blind to author and institution.
Sessions will be held online through Zoom and Discord. Information and instructions to join sessions will be sent to registered attendees via email in the week before the symposium, along with links to join workshops for workshop registrants. Materials will be placed in this year's OSF after the symposium is over.
10:20 AM Unlocking Data Engagement: Collaborating Across Library Departments for Love Data Week
Nick Ruhs (Florida State University)
Crystal Mathews (Florida State University)
Laura Pellini (Florida State University)
Promoting and engaging with students around the library's research data services often poses challenges for data librarians. At our library, we decided to use a unique and collaborative approach that brought together undergraduate data fellows, student employees from the library's Engagement and Marketing teams, and full-time library staff to plan the library's annual Love Data Week event. This presentation will showcase how the teams were able to combine their collective expertise to design an engaging, student-focused, week-long event that promoted the library's research data services to the campus community. We will discuss the development of a cross-divisional marketing strategy involving social media, newsletters, and email reminders. We will also discuss the development of activities, data-themed games and prizes, and an engaging display for tabling events at two campus libraries. We'll conclude by sharing lessons learned and strategies for all data librarians to collaborate across boundaries in promoting their services.
Learning Objectives:
1) Understand the benefits of collaborative approaches in promoting research data services to students.
2) Explore effective strategies for engaging students through events like Love Data Week utilizing cross-divisional collaboration and marketing tactics.
3) Examine the integral role of student data fellows and library employees in the planning and execution of data-themed library events gaining insights into how their involvement contributes to the success of student-focused data promotion initiatives.
10:40 AM Implementation of Data Management Services at a Smaller University
Eric Resnis (Coastal Carolina University)
Adam Johnson (Coastal Carolina University)
While data management services are ubiquitous at R1 and R2 institutions, these services are not as prevalent at smaller institutions. Smaller staff size, a deeper focus on undergraduate education, and an absence of large-scale research infrastructure all contribute to the lack of data management services. However, the need for these services exists at smaller institutions as well, with researchers often navigating complicated data management concerns themselves. Libraries can provide a framework to remedy these issues.
This session will detail the implementation of data management services at a mid-sized comprehensive institution due to administrative and faculty requests, in combination with a multi-year effort to increase the institution's research profile. The presenters will discuss how they developed a plan for data management services. They will also discuss the development of resources based on faculty input, including an extensive guide for data management and the rollout of individual consultations with faculty.
Learning Objectives:
Attendees will understand the elements required to implement data management services at smaller institutions.
Attendees will recognize potential areas for assessment and planning to implement in their own data management services.
Break 11:00 AM - 11:15 AM
Poster Sessions 11:15 AM - 12:00 PM
Data Services Internship for BIPOC Graduate Students
Syeda Mamoona Quadri (NCDS 2024 Data Intern)
Jasmine Phillips (NCDS 2024 Data Intern)
Tonja Cunningham (NCDS 2024 Data Intern)
Making the case for GIS in business school curricula
HD McKay (Vanderbilt University)
Stacy Curry-Johnson (Vanderbilt University)
Chuck Knight (Vanderbilt University)
John Paul Martinez (Vanderbilt University)
Geospatial science is all about analyzing and visualizing data variables tied to locations. Markets are all about consumer and business data (activities and trends) tied to locations. Geospatial literacy is a fundamental skill for anyone engaged in business, yet it is often absent in business curricula. At the same time, the ubiquity of GIS applications and more accessible geospatial tools (hardware and software) allow GIS specialists to collaborate productively with other disciplines. Is this a missed opportunity you are seeing at your organization? This poster will help you make the case for integrating geospatial literacy into business curricula through specific market intelligence use cases. It provides a framework for thinking about GIS data in business contexts; a summary of applications, data sources (free and subscription) and their relevance to specific use cases; and a sample lesson plan based on sessions delivered with graduate business students at Vanderbilt University.
Learning Objectives:
1) Specific use cases for incorporating geospatial data literacy into business curricula.
2) Sample lesson plan for GIS data or business librarians to start engaging business students and faculty.
3) Understanding of relevant GIS and business data sources used in industry and academia (including their scope, strengths and weaknesses).
Inclusive Computational Pedagogy
Dolsy Smith (The George Washington University)
Marcus Peerman (The George Washington University)
Emily Blumenthal (The George Washington University)
Alex Boyd (The George Washington University)
Max Turer (The George Washington University)
This poster session shares strategies, lessons learned, and assessment findings from our work on Python Camp: a four-day intensive workshop that introduces the Python language to those with little or no prior programming experience. We recently redesigned Python Camp to emphasize principles of inclusive pedagogy, aiming specifically to reach learners who don't consider themselves programmers or may have struggled to learn programming in the past. Our new curriculum is grounded in collaborative practice, allowing participants to work together to solve problems with code using a "real-world" dataset, and centering the agency of learners, who on the fourth day of Camp are invited to design their own culminating project. Throughout the Camp, we incorporate activities intended to foster practices of computational thinking: e.g., asking questions about code, developing user stories to guide development, understanding the logic of code as distinct from its syntax. Our redesigned Python Camp has been impactful for both learners and instructors, and we are eager to share our experience with others who are interested in inclusive pedagogy, teaching code in libraries, and related topics.
Learning Objectives:
1) Practice identifying and reflecting on biases that may affect our teaching.
2) Explore ways to present computational topics in ways that accommodate learners from diverse backgrounds.
3) Practice strategies for making such topics more accessible to beginners by reducing cognitive load, scaffolding,and encouraging active learning.
Students and Community Partners Navigating Food Insecurity, Many Data Points at a Time: Overview of Georgia State University Library's Public Interest Data Literacy (PIDLit) Learning Lab Course
Mandy Swygart-Hobaugh (Georgia State University)
Halley Riley (Georgia State University)
Ashley Rockwell (Georgia State University)
Poster will give an overview of the two-semester experiential-learning course they designed and taught in the Fall 2023 and Spring 2024 by Research Data Services (RDS) faculty from the Georgia State University Library's Public Interest Data Literacy (PIDLit) grant-funded initiative (https://lib.gsu.edu/pidlit). The "Tackling Food Insecurity" PIDLit Learning Lab connected students with partner organizations to apply data skills to address the real-world problem of food insecurity.
Learning Objectives:
(1) Give a brief overview of the course content and array of assignments.
(2) Detail the partner-driven data collection, analysis, and reporting activities in which students engaged.
(3) Highlight the successes, the challenges, and the lessons learned for future course offerings.
(4) Facilitate discussion with poster attendees regarding the benefit of others considering developing and teaching similar applied experiential-learning courses.
Generating Institutional Level Open Science Metrics with Scripted Processes for Tracking Publication of Research Data and Software
Michael Shensky (UT Austin)
Bryan Gee (UT Austin)
This poster focuses on progress made at the University of Texas at Austin to develop a better understanding of the university’s open science landscape by developing a scripted process for gathering information about research datasets and software published by university affiliated researchers. It discusses the use of repository and platform APIs to enable scalable collection of data. Data visualization techniques for representing generated statistics are also highlighted in the poster. It additionally provides an overview of plans for using the gathered information to inform the UT Libraries’ development, provisioning, and promotion of research data services for the university community.
Learning Objectives:
(1) the use of Python scripts and repository APIs for gathering scientometric data;
(2) data visualization techniques for representing institutional level open science metrics;
(3) the use of open science metrics to inform development of research data services
Addressing the Professional Development Bottleneck: LEADING to DataLIS
Kay P Maye (Tulane University)
Erik Mitchell (UC San Diego)
Jane Greenberg (Drexel University)
Crystal Goldman (UC San Diego)
The demand for librarians with data science skills continues to escalate as library data collections and services continue to expand. This demand is further driven by researchers' requests for data literacy instruction. While LIS curricular offerings and professional workshops provide venues for LIS students as well as early mid-career professionals to gain skills, challenges still exist for those seeking to advance their skills and knowledge and for administrators seeking to support their staff in professional development, while also running a library. The LEADS Program, the succeeding LEADING Program, and DataLIS have helped us to better understand current obstacles and provide insight into how we might address these challenges. This presentation will provide an overview of LEADING's inception and evolution to inspire others to engage in similar work addressing data science professional development needs of information professionals, and how what we've learned may help professional development efforts across other areas. We will also provide advice for information professionals looking to upskill in data science
Learning Objectives:
Attendees will learn more about the LEADING effort to build data science skills and have next steps to learn more.
Attendees will get a rapid introduction to Design Thinking approaches for strategic thinking around data science education for LIS professionals.
Attendees will learn about the problem and potential solution areas identified through our design thinking process to help address gaps in data science skills in the LIS workforce.
Break 12:00 PM - 1:00 PM
Networking & Social 1:00 PM - 2:30 PM
1:00 PM Birds of a Feather
Join us for a new social session where we discuss fun and interesting topics - all picked by participants.
Tue, Oct 8
Short Talks 10:00 AM - 11:00 AM
10:00 AM From Competition to Collaboration: Fostering a Culture of Data Sharing in the NIH HEAL Data Ecosystem
Heather Barnes (RTI)
Brandy Farlow (UNC Chapel Hill)
Open, accessible research data provides a foundation for scientific discovery. Despite clear benefits to data sharing and increased emphasis on making federally-funded research outcomes public, data sharing hesitancy remains common. This case study highlights work underway in the NIH HEAL Data Ecosystem (HDE) to engage with HEAL-funded investigators and promote data sharing. First, a review of recent literature highlights 1) common barriers to data sharing and 2) incentives that tend to help researchers overcome hesitancy. We then explore practical strategies research data administrators and stewards employ within the HEAL Data Ecosystem to overcome hesitancy and create a culture of data sharing.
Learning Objectives:
1) Learn about existing research on the most common factors that inhibit or encourage researchers to share their data.
2) Explore practical strategies for addressing common data sharing barriers.
3) Understand how a public research data sharing platform addresses common data sharing blockers and fosters a culture of open data.
10:20 AM Integrating Clinical Research Data Management into Data Collection Best Practices
Genevieve Milliken (NYU Langone Health)
Michelle Yee (NYU Langone Health)
The NYU Health Sciences Library's Data Education team routinely offers REDCap trainings to the NYU Medical Center. Recently, we have incorporated Clinical Research Data Management (CRDM) into our REDCap curriculum. REDCap is a secure web application for building and managing surveys and databases and is often used in clinical settings for data collection. Foregrounding CRDM at the beginning of REDCap classes teaches participants RDM with a clinical focus and asks them to think through logistical questions prior to and during project design.
Learning Objectives:
1) Participants will learn about Research Data Management in Clinical Settings
2) Participants will learn how to incorporate RDM/CRDM when teaching tools through the library
10:40 AM What Do Researchers Think about When They Think about Research Data Sensitivity?
Dessi Kirilova (Qualitative Data Repository, Syracuse University)
Derek Robey (Qualitative Data Repository, Syracuse University)
This project examines researchers' perceptions of data sensitivity and data sharing in different research scenarios. Specifically, we will share quantitative results from an online survey of 200 purposefully recruited respondents representing 12 social science disciplines, asking about their familiarity with and propensity to use resources and techniques that might enable better data management, as well as appropriate sharing of sensitive data (ex: data use agreements, virtual enclaves, etc.). We will also discuss common themes in respondents' free-text answers to a question about how they personally define research data sensitivity. We highlight common challenges researchers perceive in planning for data sharing of what they consider to be sensitive data, which is of relevance to the work of research data librarians. Relatedly, the presentation also highlights possible avenues for both general assistance and project-tailored guidance which data librarians are well-poised to offer.
Learning Objectives:
1. Learn about variations in the perceptions of scholars from several social, health, environmental and other related disciplines of the possibility of sharing potentially sensitive data;
2. Appreciate the wide range of interpretations researchers hold of what constitutes "sensitive" research data ;
3. Identify specific points of confusion / lack of knowledge on the topic where further library-provided education activities could be helpful.
Break 11:00 AM - 11:15 AM
Short Talks 11:15 AM - 11:55 AM
11:15 AM Introducing a new Library Carpentry course for Data Management Planning
Lena Bohman (Zucker School of Medicine at Hofstra/Northwell)
Marla Hertz (University of Alabama at Birmingham)
Data Management Plan services are increasingly in demand in libraries due to changing funder requirements. However, there is a dearth of easily accessible training for librarians on how to conduct DMP reviews. To address this gap, the presenters created a Library Carpentry course, DMP Course for Librarians, with funding from IMLS and UCLA. In this presentation, the authors will showcase the tool and briefly discuss their process for creating the course. The presentation will also cover how our fellow practitioners can contribute to the course and teach it at their own institutions.
Learning Objectives:
1. Attendees will learn about a new resource aimed to guide librarians through the early stages of enacting Data Management Plan services.
2. Attendees will learn how to contribute to the course and adopt it for use at their own institutions.
11:35 AM Sentiment analysis on virtual reference transcripts
Kristin Calvert (Western Carolina University)
Sentiment analysis uses natural language processing to determine the emotional tone of a piece of digital text. It is often used to understand user satisfaction with a service and improve the customer experience. There are many available sentiment lexicons/corpuses and Python libraries to make these analyses easy to perform even for beginning coders. However, many of these sentiment corpuses are trained on Twitter, Reddit, Yelp, or customer reviews, and may perform poorly on other types of texts, like virtual reference interactions. In this talk, we will review the performance of various lexicons (e.g., VADER, NRC Emotion Lexicon, AFINN, etc.) on library chat transcripts, and discuss how to choose one for your project.
Learning Objectives:
Understand how lexicons assign sentiment values and valences.
Understand how to select a lexicon for a project.
Break 11:55 AM - 12:00 PM
Keynote 12:00 PM - 1:00 PM
12:00 PM Big Data Projects at the NIH: Report from an NIH DATA Scholar
Dr. Michelle Hribar (Oregon Health & Science University)
Abstract: Access to large, diverse datasets is key for the development of AI models in biomedicine. The National Institutes of Health (NIH) recognizes this need and has several ongoing efforts to increase access to data for researchers: establishing data generation projects such as Bridge2AI and All of Us, promoting access to observational health data stored in electronic health records, and encouraging data sharing. All of these efforts require data standardization, which is currently an important focus of the NIH; common data models and common data elements (CDEs) are a few example approaches for improving data standardization and data sharing.
The NIH started the Data and Technology Advancement (DATA) National Service Scholar Program in 2020 to help with large data science projects across the NIH. I served as a DATA Scholar at the National Eye Institute from 2022 – 2024, focusing on standardizing ocular observational health data into the Observational Medical Outcomes Partnership (OMOP) Common Data Model. This talk will discuss my DATA scholar project within the larger context of data science and standardization projects at the NIH.
Bio: Dr. Michelle R. Hribar, PhD is an Associate Professor of Ophthalmology and Medical Informatics at Oregon Health & Science University. She was also very recently an NIH DATA Scholar at the National Eye Institute (NEI) where she worked in the Office of Data Science and Health Informatics. She is currently leading a national effort to improve the standardization of ophthalmic data for research, which includes co-chairing the Observational Health and Data Science Informatics (OHDSI) Eye Care and Vision Research workgroup with the goal of adding ophthalmic data to the OMOP common data model. Dr. Hribar originally trained as a computer scientist before retraining as a medical informaticist at Oregon Health & Science University. Her NIH grant funded research has focused exclusively on informatics in ophthalmology, specifically in the reuse of electronic health record data for research both in operations and clinical applications.
Workshop 1:00 PM - 2:30 PM
1:00 PM Open data deidentification tools
Katie Pierce-Farrier (University of North Texas Health Science Center)
Christine Nieman Hislop (University of Maryland Baltimore)
Data deidentification often poses a challenge to researchers. With more funders supporting or requiring data sharing, data deidentification is a crucial step in the data management lifecycle. This workshop will introduce and explore openly available tools and technologies to help with data deidentification, including how machine learning and natural language processing apply to deidentification.
We will focus on the NLM Scrubber from the National Library of Medicine and other open NLP tools that can be used for clinical text deidentification. This workshop will include a demonstration of how to run NLM Scrubber and show the types of data it is able to find and redact. There will be activities structured around data privacy terms and a "scavenger hunt" exploring different NLP tools. Finally, the presenters will share ways to promote data privacy and data deidentification tools at your institution.
Learning Objectives:
Participants will understand natural language processing, a type of AI, and how they approach data deidentification.
Participants will learn about NLM Scrubber and other openly available deidentification tools and will explore ways to promote these tools at their institution.
Participants will leave the session with ideas on how to promote deidentification tools and how to incorporate them into library instruction and outreach programs.
Wed, Oct 9
Short Talks 10:00 AM - 11:00 AM
10:00 AM Data-driven data services development: a research method for informed growth with a dynamic campus environment
Reid Boehm (Purdue University)
Kelly Burnes (Purdue University)
As library specialists who support campus researchers with data management and sharing, it is difficult to identify a clear, holistic picture of researcher needs for services such as training, consultations, and the institutional data repository. Necessary growth in tandem with our campus research environment is underscored by anticipated requirement changes associated with the 2022 Office of Science and Technology Policy memo and other nascent policies and recommendations. This presentation will share a research design using formal content analysis of data management plans from grant awardees planning to use the data repository. It will provide an overview of the research and the implications of the results for our team. As a sample case study it will provide attendees: 1.) an outline of the method for collection and analysis, 2.) a way to engage with data for interpreting results, and 3.) a description of how specialists might choose to bring results into strategic and holistic service development.
Learning Objectives:
1.) an outline of the method for collection and analysis
2.) a way to engage with data for interpreting results, and
3.) a description of how specialists might choose to bring results into strategic and holistic service development.
10:20 AM Data Accessibility Checklist
Brandie Pullen (Virginia Tech)
Want to make your data more accessible but don't know where to start? Come to this lightning talk to hear about a Data Accessibility Checklist. This checklist takes out the why and how while giving you actionable tasks to complete to make digital content more accessible. The checklist covers a few short tasks you can do for each of the following data types: text, tabular, code, and image. This checklist should be used as a starting point for researchers and curators to make their content more accessible for everyone. This checklist goes along with the DCN's Accessibility Primer.
Learning Objectives:
Data accessibility concerns
How to make your data more accessible
10:40 AM Human Judgement and Data Curation in an Age of Automation
Aditya Ranganath (Center for Research Data and Digital Scholarship, University of Colorado Boulder)
Data librarians with an interest in data curation are increasingly interested in the role of automated workflows in facilitating data curation tasks. However, it is unlikely that all data curation workflows can be successfully automated; moreover, even in cases where automation is possible, it may not be desirable. This presentation builds on social scientific work on the relationship between human expertise and automation to speculate on the future division of labor between algorithmic decisionmaking and human judgment in the realm of data curation, and explores how the interaction between automated systems and human expertise can be structured in ways that maximize the quality of published data. The presentation will be appropriate for data librarians and professionals from all backgrounds and experience levels, but would be most appropriate for those with some familiarity with data curation.
Learning Objectives:
Attendees will learn more about automation in current data curation workflows, and possibilities for the future; they will also take away a framework for thinking systematically about the relationship between human judgement and algorithmic decision-making in the realm of data curation
Break 11:00 AM - 11:15 AM
Short Talks 11:15 AM - 11:55 AM
11:15 AM Rigor and reproducibility for health sciences graduate students
Nina Exner (Virginia Commonwealth University)
This session will talk about how data librarians can use their data skills to support reproducibility education. The presenter will share how a research data librarian is co-instructing on reproducibility with their school of medicine faculty. Although this class example is at a medical school, the presenter will also discuss the broader context of how the NIH emphasizes "rigor and reproducibility" in training programs. They will present how they frame reproducibility and data sharing in the wider context of health sciences and behavioral sciences trends, so that students understand the changing face of data practices in research. Finally, the presenter will share examples of resources that can be leveraged by data librarians or subject librarians for teaching about reproducibility and transparency.
Learning Objectives:
(1) What the NIH means when talking about rigor, reproducibility, and transparency for graduate students;
(2) How the presenter collaborated to design course objectives;
(3) An example of how to use specific existing resources to create student activities about reproducibility
11:35 AM Flipping the Data Lifecycle: Teaching the "Why" of Data Management to Graduate Students
Isaac Wink (University of Kentucky)
The increasing expectation across disciplines that researchers share their data has led to expanded use of data repositories, yet many shared datasets are not well curated. Data management is an essential skill for researchers in all disciplines, yet it can be tricky to teach it in a manner that sticks. This presentation (relevant to data librarians at any level with instruction responsibilities) will offer a lesson plan for teaching data management in reverse by first familiarizing students with the relevant data repositories in their discipline and then examining individual datasets they might reuse. It will also include insights from applying the lesson for a graduate-level entomology course. By having students focus on a personally relevant dataset, they not only learn about general data management best practices but also have an example of how those practices translate to their discipline.
Learning Objectives:
1) develop class instruction that presents data management best practices as a set of solutions to data curation issues, and
2) leverage students' own identification of barriers to reusing data from repositories to connect good data management practices to the long-term value of their data
11:55 AM Closing Remarks
SEDLS 2024 Planning Committee
Break 12:00 PM - 1:00 PM
Workshop 1:00 PM - 2:30 PM
1:00 PM Harnessing the Power of AI for Data Analysis and Visualization
Robert Laws (Georgetown University in Qatar)
This workshop introduces data librarians, researchers, and professionals to the latest AI technologies and trends revolutionizing data analysis and visualization. Participants will explore the capabilities of large language models (LLMs) like ChatGPT for processing, cleaning, transforming, and outputting data efficiently. The workshop will also cover industry tools such as Tableau that are rapidly integrating AI technologies into their existing software and services.
During the hands-on portion of the workshop, attendees will use the free version of ChatGPT to follow along, which can perform file uploads and data analysis as of May 2024. They will learn to craft effective prompts for data processing and formulate queries to analyze the prepared data. By the end of the session, participants will have a better understanding of how to leverage AI tools to enhance their data analysis and visualization workflows.
Learning Objectives:
1) Familiarity with the latest AI technologies and trends, such as large language models (LLMs) like ChatGPT, and their applications in data analysis and visualization.
2) Hands-on experience crafting effective prompts for data processing and formulating queries to analyze data.