Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions pydata-global-2024/category.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"title": "PyData Global 2024"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
{
"description": "DuckDB is revolutionizing data processing by enabling in-memory OLAP SQL operations with a lightweight, dependency-free architecture. This talk explores how DuckDB can be leveraged to handle large-scale, massively parallel data processing, ranging from hundreds of gigabytes to terabytes, outside traditional SQL and Spark warehouse systems. We will go over the integration with the Python ecosystem and demonstrate its scaling potential using the cloud compute.\n\nPyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. \n\nPyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.",
"duration": 1772,
"language": "eng",
"recorded": "2024-12-03",
"related_urls": [
{
"label": "Conference Website",
"url": "https://pydata.org/global2024"
},
{
"label": "https://github.com/numfocus/YouTubeVideoTimestamps",
"url": "https://github.com/numfocus/YouTubeVideoTimestamps"
}
],
"speakers": [
"Adarsh Namala"
],
"tags": [],
"thumbnail_url": "https://i.ytimg.com/vi/qSs5ALVbzTk/maxresdefault.jpg",
"title": "Scaling Outside the Warehouse Using DuckDB and Python",
"videos": [
{
"type": "youtube",
"url": "https://www.youtube.com/watch?v=qSs5ALVbzTk"
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
{
"description": "Hi! Have you ever wished your pure Python libraries were faster? Or wanted to fundamentally improve a Python library by rewriting everything in a faster language like C or Rust? Well, wish no more... NetworkX's backend dispatching mechanism redirects your plain old NetworkX function calls to a FASTER implementation present in a separate backend package by leveraging the Python's entry_point specification!\n\nNetworkX is a popular, pure Python library used for graph(aka network) analysis. But when the graph size increases (like a network of everyone in the world), then NetworkX algorithms could take days to solve a simple graph analysis problem. So, to address these performance issues, a backend dispatching mechanism was recently developed. In this talk, we will unveil this dispatching mechanism and its implementation details, and how we can use it just by specifying a backend kwarg like this:\n\nPyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. \n\nPyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.",
"duration": 1746,
"language": "eng",
"recorded": "2024-12-03",
"related_urls": [
{
"label": "Conference Website",
"url": "https://pydata.org/global2024"
},
{
"label": "https://github.com/numfocus/YouTubeVideoTimestamps",
"url": "https://github.com/numfocus/YouTubeVideoTimestamps"
}
],
"speakers": [
"Aditi Juneja"
],
"tags": [],
"thumbnail_url": "https://i.ytimg.com/vi/2UkZVKj6QGY/maxresdefault.jpg",
"title": "Understanding API Dispatching in NetworkX",
"videos": [
{
"type": "youtube",
"url": "https://www.youtube.com/watch?v=2UkZVKj6QGY"
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
{
"description": "This talk will cover how to use pre-trained HuggingFace models, specifically wav2vec 2.0 and WavLM, to detect audio deepfakes. These deepfakes, made possible by advanced voice cloning tools like ElevenLabs and Respeecher, present risks in areas like misinformation, fraud, and privacy violations. The session will introduce deepfake audio, discuss current trends in voice cloning, and provide a hands-on tutorial for using these transformer-based models to identify synthetic voices by spotting subtle anomalies. Participants will learn how to set up these models, analyze deepfake audio datasets, and assess detection performance, bridging the gap between speech generation and detection technologies.\n\nPyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. \n\nPyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.",
"duration": 1857,
"language": "eng",
"recorded": "2024-12-03",
"related_urls": [
{
"label": "Conference Website",
"url": "https://pydata.org/global2024"
},
{
"label": "https://github.com/numfocus/YouTubeVideoTimestamps",
"url": "https://github.com/numfocus/YouTubeVideoTimestamps"
}
],
"speakers": [
"Adriana Stan"
],
"tags": [],
"thumbnail_url": "https://i.ytimg.com/vi/MGRmKlDj9rk/maxresdefault.jpg",
"title": "Off-the-shelf HuggingFace models for audio deepfake detection",
"videos": [
{
"type": "youtube",
"url": "https://www.youtube.com/watch?v=MGRmKlDj9rk"
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
{
"description": "Beneath the buzz of AI breakthroughs, a quiet revolution is unfolding in the world of forecasting: foundational time series models. These models promise to change the game for operational forecasting, but don’t expect magic. You won’t suddenly become a stock market oracle just by throwing data at them.\n\nIn this talk, we’ll peel back the layers of these new time series models, starting with how they work and how they evolved from transformers. We’ll tackle the big problems of limited data and overhyped algorithms, and explore the real-world challenges that make or break forecasts (hint: human input matters).\n\nPyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. \n\nPyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.",
"duration": 1865,
"language": "eng",
"recorded": "2024-12-03",
"related_urls": [
{
"label": "Conference Website",
"url": "https://pydata.org/global2024"
},
{
"label": "https://github.com/numfocus/YouTubeVideoTimestamps",
"url": "https://github.com/numfocus/YouTubeVideoTimestamps"
}
],
"speakers": [
"Ahad Shoaib"
],
"tags": [],
"thumbnail_url": "https://i.ytimg.com/vi/5Nt0p_3zU7g/maxresdefault.jpg",
"title": "Foundational Time Series Models in Practice: The Future of Forecasting, or Just Hype?",
"videos": [
{
"type": "youtube",
"url": "https://www.youtube.com/watch?v=5Nt0p_3zU7g"
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
{
"description": "Vector databases are everywhere, powering LLMs. But indexing embeddings, especially multivector embeddings like ColPali and Colbert, at a bulk is memory intensive. Vector streaming solves this problem by parallelizing the tasks of parsing, chunking, and embedding generation and indexing it continuously chunk by chunk instead of bulk. This not only increase the speed but also makes the whole task more optimized and memory efficient.\n\nThe library gives many vector database supports, like Pinecone, Weavaite, and Elastic.\n\nPyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. \n\nPyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.",
"duration": 1680,
"language": "eng",
"recorded": "2024-12-03",
"related_urls": [
{
"label": "Conference Website",
"url": "https://pydata.org/global2024"
},
{
"label": "https://github.com/numfocus/YouTubeVideoTimestamps",
"url": "https://github.com/numfocus/YouTubeVideoTimestamps"
}
],
"speakers": [
"Akshay Ballal",
"Sonam Pankaj"
],
"tags": [],
"thumbnail_url": "https://i.ytimg.com/vi/FdOeLY3rGA8/maxresdefault.jpg",
"title": "The Memory Efficient Indexing for Vector Databases",
"videos": [
{
"type": "youtube",
"url": "https://www.youtube.com/watch?v=FdOeLY3rGA8"
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
{
"description": "Time series analysis provides essential tools for modeling and predicting time-dependent data, especially data exhibiting seasonal patterns or serial correlation. This tutorial covers tools in the StatsModels library including seasonal decomposition and ARIMA. We'll develop the ARIMA model bottom-up, implementing it one piece at a time, and then using StatsModels. As examples, we'll look at weather data and electricity generation from renewable sources in the United States since 2004 -- but the methods we'll cover apply to many kinds of real-world time series data.\n\nPyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. \n\nPyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.",
"duration": 5376,
"language": "eng",
"recorded": "2024-12-03",
"related_urls": [
{
"label": "Conference Website",
"url": "https://pydata.org/global2024"
},
{
"label": "https://github.com/numfocus/YouTubeVideoTimestamps",
"url": "https://github.com/numfocus/YouTubeVideoTimestamps"
}
],
"speakers": [
"Allen Downey"
],
"tags": [],
"thumbnail_url": "https://i.ytimg.com/vi/foMbacbuAQk/maxresdefault.jpg",
"title": "Time Series Analysis with StatsModels",
"videos": [
{
"type": "youtube",
"url": "https://www.youtube.com/watch?v=foMbacbuAQk"
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
{
"description": "Knowledge graphs are excellent at representing and storing heterogeneous and interconnected information in a structured manner, effectively capturing complex relationships and attributes across different data types.\nStructured text generation allows for building knowledge graphs by providing neatly structured outputs, making it an ideal method for extracting structured information.\nSimilarly, structured text generation enables the creation of agents by defining which tools are allowed and what action inputs are permitted.\nIn this talk, we first build a graph database from unstructured data and then we create an agent to query the graph database. We will show these capabilities with a demo.\n\nPyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. \n\nPyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.",
"duration": 1696,
"language": "eng",
"recorded": "2024-12-03",
"related_urls": [
{
"label": "Conference Website",
"url": "https://pydata.org/global2024"
},
{
"label": "https://github.com/numfocus/YouTubeVideoTimestamps",
"url": "https://github.com/numfocus/YouTubeVideoTimestamps"
}
],
"speakers": [
"Alonso Silva"
],
"tags": [],
"thumbnail_url": "https://i.ytimg.com/vi/94yuQKoDKkE/maxresdefault.jpg",
"title": "Building Knowledge Graph-Based Agents with Structured Text Generation",
"videos": [
{
"type": "youtube",
"url": "https://www.youtube.com/watch?v=94yuQKoDKkE"
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
{
"description": "Taking any project from zero to production is challenging. And Data Science has a particularly high failure rate, with a lot of ideas not getting beyond the prototype stage.\n\nBut there are real reasons for this: there is intrinsic and unknown complexity in data, and there are often big challenges knowing if we have actually solved the problem -- the answer is so rarely \"yes\" or \"no\".\n\nIn this talk I'll cover some key learnings from a decade working on DS problems at early- and later-stage startups, building products to improve product market fit.\n\nPyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. \n\nPyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.",
"duration": 1706,
"language": "eng",
"recorded": "2024-12-03",
"related_urls": [
{
"label": "Conference Website",
"url": "https://pydata.org/global2024"
},
{
"label": "https://github.com/numfocus/YouTubeVideoTimestamps",
"url": "https://github.com/numfocus/YouTubeVideoTimestamps"
}
],
"speakers": [
"Andrew Weeks"
],
"tags": [],
"thumbnail_url": "https://i.ytimg.com/vi/FA1TWdxoyV4/maxresdefault.jpg",
"title": "Taking Data Science in industry from zero to production",
"videos": [
{
"type": "youtube",
"url": "https://www.youtube.com/watch?v=FA1TWdxoyV4"
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
{
"description": "This talk showcases and exemplifies the rapid specification and execution of Quantile Regression workflows. Various use cases are discussed, including fitting, outlier detection, conditional CDFs, and simulations, using different types of time series data.\n\nPyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. \n\nPyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.",
"duration": 1752,
"language": "eng",
"recorded": "2024-12-03",
"related_urls": [
{
"label": "Conference Website",
"url": "https://pydata.org/global2024"
},
{
"label": "https://github.com/numfocus/YouTubeVideoTimestamps",
"url": "https://github.com/numfocus/YouTubeVideoTimestamps"
}
],
"speakers": [
"Anton Antonov"
],
"tags": [],
"thumbnail_url": "https://i.ytimg.com/vi/Z2uz7kwBli8/maxresdefault.jpg",
"title": "Quantile Regression Workflows",
"videos": [
{
"type": "youtube",
"url": "https://www.youtube.com/watch?v=Z2uz7kwBli8"
}
]
}
Loading