/r/Python

Photograph via snooOG

The official Python community for Reddit! Stay up to date with the latest news, packages, and meta information relating to the Python programming language.

If you have questions or are new to Python use r/LearnPython

News about the dynamic, interpreted, interactive, object-oriented, extensible programming language Python

Current Events

Upcoming Events

Full Events Calendar

Please read the rules

You can find the rules here.

If you are about to ask a "how do I do this in python" question, please try r/learnpython, the Python discord, or the #python IRC channel on Libera.chat.

Please don't use URL shorteners. Reddit filters them out, so your post or comment will be lost.

Posts require flair. Please use the flair selector to choose your topic.

Posting code to this subreddit:

Add 4 extra spaces before each line of code

def fibonacci():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

Online Resources

Online exercices

programming challenges

Asking Questions

Try Python in your browser

Docs

Libraries

Related subreddits

Python jobs

Newsletters

Screencasts

/r/Python

1,292,998 Subscribers

5

Finally Completed : A Personal Project built over the weekend(s) - Netflix Subtitle Translator

Motivation : Last week, I posted about my project, Netfly: The Netflix Translator, here on r/python. I initially built it to solve a problem I ran into while traveling. Let me explain :

On a flight from New Delhi to Tokyo, I started watching an anime movie, The Concierge. The in-flight entertainment had English subtitles, and I was hooked, but I couldn’t finish it. Later, I found the movie on Netflix Japan, but it was only available with Japanese subtitles.

Here’s the problem: I don’t know enough Japanese (Nihongo wa sukoshi desu) to follow along, so I decided to build something that could fetch those Japanese subtitles, translate them into English, and overlay the translation on the video while retaining the Japanese subtitles which would give me better context.

What started as a personal project quickly became an obsession.

What does the Project Do ? : The primary goal of this project is simple: convert Japanese subtitles on Netflix into English subtitles in an automated way. This is particularly useful when English subtitles aren’t available for a title.

The Evolution of this Project / High Level Tech Solution : This is not the first iteration of Netfly. It has gone through two major updates based on feedback and my own learning.

Iteration 1: A Tech-Heavy but Costly Solution

How It Worked:

The Result: It worked, but it was far from practical. The cost of using Google Vision API for every frame made it unsustainable, and the whole process was painfully slow.

Iteration 2: Streamlining with Subtitles file

  • I discovered Netflix subtitles can be downloaded (through some effort).
  • Parsed the downloaded XML subtitle file using lxml to extract the Japanese text, start time, and end time via XPath.
  • Sent the extracted text to AWS Translate for English translation.

The Result: This was much better—cheaper, faster, and simpler. But there was still a manual step : downloading the subtitle file.

Iteration 3: Fully Automated Workflow

  • Integrated a Playwright script that logs into Netflix, navigates to the selected video, and downloads the subtitle XML file automatically.
  • Added a CLI using Python’s Click library to simplify running the workflow.
  • Once the XML file is fetched, the script extracts Japanese text and timestamps, sends the text to AWS Translate, and generates English subtitles in a JSON format.

The Result: All Steps are completely automated now.

Target Audience : This project started as a personal tool, but it can be useful for:

  • Language Enthusiasts**:** Anyone who wants to watch Netflix content in languages they don’t understand.
  • Developers**:** If you’re exploring libraries like playwright, lxml, click , or translation workflows, this project can be a solid learning resource.

Comparison with Other Similar Tools : Existing tools, like Chrome extensions, rely on pre-existing subtitles in the target language. For example, they can overlay English subtitles, but only if those subtitles are already available. Netfly is different because

  • It handles cases where English subtitles don’t exist.
  • Automates the entire process, from fetching Japanese subtitles to translating them into English.
  • Provides an end-to-end workflow with minimal manual effort.

To the best of my knowledge, no other tool automates this entire flow.

Working Demo / Screenshots :
https://imgur.com/a/vWxPCua
https://imgur.com/a/zsVkxhT

https://imgur.com/a/bWHRK5H
https://imgur.com/a/pJ6Pnoc

What's next : This is still a work in progress, but I feel it’s in a solid state now. Here’s what’s on my mind for the next steps:

  1. Edge Cases: Testing on a broader range of Netflix titles to handle variations in subtitle formats.

  2. Performance: Optimizing XML parsing and translation for faster processing.

  3. Extensibility: Adding support for other subtitle languages.

  4. Error Handling : Since i iterated very fast, I know the Error Handling is not upto the mark.

If this sounds interesting for you, the code is up on GitHub: https://github.com/Anubhav9/Netfly-subtitle-converter-xml-approach

I’d love to hear your thoughts , feedback and suggestions on this.
Cheers, and Thank you !

0 Comments
2024/11/16
12:15 UTC

0

Power Automate Application Hosted on the Windows server with IIS. Python watchdog too.

Hi potential bots,

I'm a Backend developer who works with Python and Flask. Also recently started using the IIS thingy to host our restful API backend on an in-premises Windows server. Demn! Nice intro I got.

So the issue** I want/need to host a power automate Application/desktop whatever that box code like software in blue is called. On a Windows server using IIS. And it should be running all the time. But VM might be locked after some time.

I also have a solution there that uses a watchdog to do some stuff after PA's processing is done (Excel creation automation task).

So sharks my ask would be, how the fruit I do the set-up of a power automate Application when I never worked on it? Please share detailed steps or else I might bite you.

Regards, Your BF

P.S.: I don't know a thing. Pls just 🍻 with me. Nor did I search for this on Bing 😏.

  • I also posted the same in the MS

community but I believe more in peeps here.

Tldr; how to host a power automate desktop Application on a Windows server and keep it running forever.

4 Comments
2024/11/16
08:46 UTC

3

Saturday Daily Thread: Resource Request and Sharing! Daily Thread

Weekly Thread: Resource Request and Sharing 📚

Stumbled upon a useful Python resource? Or are you looking for a guide on a specific topic? Welcome to the Resource Request and Sharing thread!

How it Works:

  1. Request: Can't find a resource on a particular topic? Ask here!
  2. Share: Found something useful? Share it with the community.
  3. Review: Give or get opinions on Python resources you've used.

Guidelines:

  • Please include the type of resource (e.g., book, video, article) and the topic.
  • Always be respectful when reviewing someone else's shared resource.

Example Shares:

  1. Book: "Fluent Python" - Great for understanding Pythonic idioms.
  2. Video: Python Data Structures - Excellent overview of Python's built-in data structures.
  3. Article: Understanding Python Decorators - A deep dive into decorators.

Example Requests:

  1. Looking for: Video tutorials on web scraping with Python.
  2. Need: Book recommendations for Python machine learning.

Share the knowledge, enrich the community. Happy learning! 🌟

0 Comments
2024/11/16
00:00 UTC

35

Game 987, Like 2048 but Fibonacci (Made in Python)

https://987.reflex.dev/

What My Project Does

From Adhami the author: I was wondering how 2048 would feel like if instead of powers of two, we can merge consequent fibonacci numbers. Turns out to be a rather interesting game that is fairly forgiving and grows very slowly. I found it difficult to come up with an overall strategy. I had a simple search algorithm that was able to achieve a score of exactly 66,666 (not joking). Getting a 987 block shouldn't be difficult.

You can take a look into the code here: https://github.com/adhami3310/987 (the simple search algorithm is inside the code as well)

Target Audience: Anyone

Comparison: Similar to 2048 but fib

2 Comments
2024/11/15
19:57 UTC

95

PyPI now has attestation. Thanks I hate it.

Blog post: https://blog.pypi.org/posts/2024-11-14-pypi-now-supports-digital-attestations/

I'm angry that it got partially funded by the sovreign tech fund, when it's about "securing" uploads by giving the keys to huge USA companies. I think it's criminal they got public money for this.

I also don't think it adds any security whatsoever. It just moves the authentication from using credentials to PyPI to using credentials to github. They can be stolen in the exact same way.

edit: It got "GERMAN" public money.

104 Comments
2024/11/15
16:24 UTC

6

Yami - A music player made with Tkinter Now on pypi!

I would like some user feedback
Github Link: https://github.com/DevER-M/yami
Pypi Link: https://pypi.org/project/yami-music-player/
Some of the features

  • mp3 , flac, and many audio formats supported for playback
  • Clean UI
  • Can download music with art cover
  • it is also asynchronous

Libraries used

  • customtkinter
  • spotdl
  • mutagen

Target audience
This project will be useful for people who do not want ads and want a simple user interface to play music

Comparison
There are currently no projects that have all the features covered and is made with tkinterTo use this install all requirements in the .txt file and you are good to go

RoadMap
I will update it now and then

A follow would be nice! https://github.com/DevER-M

3 Comments
2024/11/15
16:15 UTC

0

What is wrong with face_recognition

so i wanted to do a face_recogntion attendence system but the heck , always error with this or dlib , for once it was not installing , and now it is installed it aint working proplerly , i tripled checked the code its the issue of this , , on linux it runs shockingly well , but nfortunately i have to use windows

3 Comments
2024/11/15
15:13 UTC

0

http://awakenerd.com/2024/11/15/puzzles-to-improve-python/

I compiled a list of puzzles to improve Python. I hope this blog post serves as a humble guide for anyone interested in improving their Python by solving puzzles.

4 Comments
2024/11/15
15:08 UTC

3

Need project Idea

Hello Everyone A python Programmer here Just wondering if there is any kind of project / research work ideas which can be implemented in the field of space exploration/ technology cause I'm obsessed with space ;) Just give me suggestions Happy Coding ;)

9 Comments
2024/11/15
11:48 UTC

11

fxgui: Collection of Python Classes and Utilities designed for Building Qt-based UIs in VFX

Hey Python enthusiasts! Any VFX folks here? I've developed a little package called fxgui - a collection of Python classes and utilities designed for building Qt-based UIs in VFX-focused DCC applications.

It's available on GitHubPyPI, and comes with documentation. I'd love to hear your thoughts and get some feedback!

Target Audience

  • VFX/CGI people working from multiple DCCs.

Key Features

  • Quick setup of common widgets.
  • Reusable custom UI components.
  • Fully compatible over PySide2/PySide6, thanks to qtpy.

Comparison

  • Specifically designed for multi-DCC environments (Maya, Houdini, Nuke, etc.).
  • Saves development time by offering ready-to-use components.
  • Maintains consistency and standardization across projects and DCCs.
1 Comment
2024/11/15
11:27 UTC

29

Dispatchery: Type-aware, multi-arg function dispatch for complex and nested Python types

Links: Github, PyPI

What it does:

dispatchery is a lightweight Python package for function dispatching inspired by the standard singledispatch decorator, but with support for complex, nested, parameterized types, like for example tuple[str, dict[str, int | float]].

Comparison:

Unlike singledispatch, dispatchery can dispatch based on:

  • Generic parameterized types (e.g. list[int])
  • Nested types (e.g. tuple[str, dict[str, int | float]])
  • Union types (e.g. int | str or Union[int, str])
  • Multiple arg and kwarg values, not just the first one

Target Audience:

Python developers who don't like having a bunch of if isinstance checks everywhere in their code.

Example :

from dispatchery import dispatchery

@dispatchery
def my_func(value):
    return "Standard stuff."

@my_func.register(list[str])
def _(value):
    return "Strings!"

@my_func.register(list[int] | list[float])
def _(value):
    return "Numbers!"

@my_func.register(str, int | float, option=str)
def _(value1, value2, option):
    return "Two values and a kwarg!"

# my_func(42) or my_func("hello") will return "Standard stuff."
# my_func(["a", "b", "c"]) will return "Strings!"
# my_func([1, 2, 3]) or my_func([0.2, 0.5, 1.2]) will return "Numbers!"
# my_func("hello", 42, option="test") will return "Two values and a kwarg!"

Installation:

pip install dispatchery

See the full README on Github.

MIT license, feedback welcome!

14 Comments
2024/11/15
09:55 UTC

23

The Ultimate Guide to Implement Function Overloading in Python

When it comes to function overloading, those who have learned Java should be familiar with it. One of the most common uses is logging, where different overloaded functions are called for different parameters. So, how can we implement function overloading in Python? This post explains how. The Ultimate Guide to Implement Function Overloading in Python

37 Comments
2024/11/15
04:36 UTC

44

I played a minute-long video in Windows Terminal

I recently worked on a project combining my love for terminal limits and video art. Here’s what I achieved: • Rendered a 1-minute-long (almost two) ASCII video in the terminal, without graphics libraries or external frameworks. • Used true 24-bit colors for each frame, offering deeper color representation in terminal-based projects. • Processed 432 million characters over 228 seconds, translating each frame’s pixels to colors. • Optimized performance with multi-processing, running on an integrated graphics card.

Specs:

•	30 FPS
•	160,000+ characters per frame
•	2,700 frames
•	3 pixels per character for better performance

For further optimization, I reduced the font size to 3 pixels and used background colors to handle brightness.

What my project does? While not the most practical project, it’s an experiment I’m satisfied with it. No real use, but hey, it’s fun!

Target audience This is more of a fun project so I can't say it has a specific target audience, but I could say that people that strangely feels good coding "useless" things might like it.

Comparison
Well it is not an ASCII player anymore to be precise, but what it does now is just display video in the terminal using basically pure ANSI, I don't think there is an exact alternative to this since it doesn't serve a specific purpose, except from, well, displaying video with text, it is a fun project.

P.S. I’m considering rewriting the frame conversion in C to speed things up. More improvements are coming soon!

That’s it, you can watch a preview with Tank! from cowboy bebop (ignore some random color stripes i had to do some optimization but wasn’t really precise on difference calculation)

You can find the repo here

but be aware that the current version was not pushed to github yet, but feel free to analyze the old versions/commits if you feel like, I will update when I release the current code.

OBS: changefontsize.py only works with windows terminal, as it changes the default font from your profile, will be removed in the current version as it degrades compatibility. Removed in current version

9 Comments
2024/11/15
04:18 UTC

33

I shared a Python Data Science Bootcamp (7+ Hours, 7 Courses and 3 Projects) on YouTube

Hello, I shared a Python Data Science Bootcamp on YouTube. Bootcamp is over 7 hours and there are 7 courses with 3 projects. Courses are Python, Pandas, Numpy, Matplotlib, Seaborn, Plotly and Scikit-learn. I am leaving the link below, have a great day!

Bootcamp: https://www.youtube.com/watch?v=6gDLcTcePhM

Data Science Courses Playlist: https://youtube.com/playlist?list=PLTsu3dft3CWiow7L7WrCd27ohlra_5PGH&si=6WUpVwXeAKEs4tB6

11 Comments
2024/11/15
04:07 UTC

0

Cloudflare turnstyle

Anyway to bypass this with python and chrome?

Its not on the front page, but in the website itself.

The problem is when i manually click it, it gives still erorr?…

0 Comments
2024/11/15
02:05 UTC

4

Friday Daily Thread: r/Python Meta and Free-Talk Fridays

Weekly Thread: Meta Discussions and Free Talk Friday 🎙️

Welcome to Free Talk Friday on /r/Python! This is the place to discuss the r/Python community (meta discussions), Python news, projects, or anything else Python-related!

How it Works:

  1. Open Mic: Share your thoughts, questions, or anything you'd like related to Python or the community.
  2. Community Pulse: Discuss what you feel is working well or what could be improved in the /r/python community.
  3. News & Updates: Keep up-to-date with the latest in Python and share any news you find interesting.

Guidelines:

  • All topics should be related to Python or the /r/python community.
  • Be respectful and follow Reddit's Code of Conduct.

Example Topics:

  1. New Python Release: What do you think about the new features in Python 3.11?
  2. Community Events: Any Python meetups or webinars coming up?
  3. Learning Resources: Found a great Python tutorial? Share it here!
  4. Job Market: How has Python impacted your career?
  5. Hot Takes: Got a controversial Python opinion? Let's hear it!
  6. Community Ideas: Something you'd like to see us do? tell us.

Let's keep the conversation going. Happy discussing! 🌟

5 Comments
2024/11/15
00:00 UTC

0

How can we iterate 10000 websites efficiently?

Hello, I have 10,000 websites to assess for reCAPTCHA implementation and am looking for a more efficient solution. Currently, I'm using Selenium and ThreadPoolExecutor, which depend heavily on my computer's processing power. I can only iterate through 5 or 10 sites simultaneously to run a JavaScript script and determine if reCAPTCHA is present. This method takes approximately 10 hours with just 5 threads in Python. I need a better approach to expedite this process.

9 Comments
2024/11/14
20:38 UTC

37

Would a Pandas-compatible API powered by Polars be useful?

Hello, I don't know if already exists but I believe that would be great if there is a library that gives you the same API of pandas but uses Polars under the hood when possible.

I saw how powerful is Polars but still data scientists use a lot of pandas and it’s difficult to change habits. What do you think?

67 Comments
2024/11/14
19:48 UTC

11

SqueakyCleanText: A Modular Text Processing Library with Advanced NER

GitHub: SqueakyCleanText | PyPI: squeakycleantext

Happy to share SqueakyCleanText, a Python library designed to streamline text preprocessing for Natural Language Processing (NLP) and Machine Learning (ML) tasks. Whether you're working on language models, statistical ML pipelines, or any text-heavy application, this library aims to make your preprocessing pipeline more efficient and flexible.

🎯 Target Audience

  • Data Scientists, AI Engineers and Machine Learning Engineers dealing with text data.

  • NLP Researchers and NLP Linguists looking for customisable preprocessing tools.

  • Developers building applications that require text cleaning and anonymisation.

🔑 Key Features

  1. Advanced Named Entity Recognition (NER)

    • Ensemble of Models: Utilises multiple NER models from Hugging Face Transformers for improved accuracy.

    • Smart Text Chunking: Efficiently handles long texts by splitting them into optimized chunks.

    • Configurable Confidence Thresholds: Adjust the sensitivity of entity detection.

    • Configurable Models: Choose NER models which suits your use-case.

    • Configurable Positional Tags: Choose what you would like to be removed from the texts.

    • Automatic Language Detection: Supports English, German, Spanish, and Dutch with automatic model selection.

  2. Modular Pipeline Architecture

    • Toggle-able Features: Easily enable or disable any step in the pipeline.

    • Single and Batch Processing: Consistent configuration applies to both modes.

    • Default Pipeline Includes:

      • Bad Unicode correction

      • HTML and URL handling

      • Contact information anonymization (emails, phone numbers)

      • Date and number normalization

      • Advanced NER processing

      • Whitespace and punctuation normalization

  3. Performance Optimizations

    • Under-the-Hood NER Improvements: Enhanced NER processing delivers faster results without compromising accuracy.

    • Batch Processing Support: Process large datasets efficiently with configurable batch sizes.

    • Memory Management: Automatic cleanup of GPU memory to handle large-scale processing.

🚀 Comparison

  • Comprehensive and Modular: Unlike libraries that focus on specific tasks, SqueakyCleanText offers a full suite of preprocessing steps that you can customize to your needs.

  • Advanced NER Integration: Combines multiple NER models and uses smart chunking to improve entity recognition in long texts.

  • Dual Output Formats: Provides both language model-formatted text and statistical model-formatted text in a single pass.

  • Easy Integration: Designed to seamlessly fit into existing workflows with minimal adjustments.

💻 Quick Start Guide

Installation

pip install SqueakyCleanText

🛠 Integrate into Your Workflow

  • Customizable Pipeline: Tailor the preprocessing steps to match your project's requirements by toggling features in config.py.

  • Seamless NER Integration: Use the advanced NER processing to anonymize sensitive data or extract entities for downstream tasks.

  • Flexible Processing: Apply the same configurations to both single and batch processing modes without changing your code.

  • Efficient for Large Datasets: Leverage batch processing and memory optimizations to handle large volumes of text data.

7 Comments
2024/11/14
18:52 UTC

0

Can you solve this Python riddle I made for my colleagues?

The manager came to me from a sister team and asked me to produce the obscure Python code I could come up with. Because she wanted to give her developers a challenge. The requirements was that it should produce a code that could be sent in a text message to get the next challenge. And no you are not allowed to run it:) They solved in 30 minutes, can you solve it?

import inspect
def code_as_it_was_meant_to_be(tmp):
    """
    www.lexico.com/definition/code
    "A system of words, letters, figures, or symbols used to represent others,
    especially for the purposes of secrecy."
    Send what is printed out of by running this functionin a text message to xxx
    """
    if len(set(tmp)) * 2 > len(tmp):
        tmp = eval(inspect.stack()[1][4][0].replace(tmp, tmp + tmp[::-1]))
        print(
            "".join(
                str(chr((ord(tmp[i * 2]) + ord(tmp[-(i + 1) * 2])) // 2))
                for i in range(len(tmp) // 4)
            )
        )
    else:
        return tmp[::-1]


code_as_it_was_meant_to_be("d,W3b6`@")
5 Comments
2024/11/14
18:27 UTC

41

Make your Github profile more attractive as a Python Developer

What My Project Does:

This project automates the process of showcasing detailed analytics and visual insights of your Python repositories on your GitHub profile using GitHub Actions. Once set up, it gathers and updates key statistics on every push, appending the latest information to the bottom of your README without disrupting existing content. The visualizations are compiled into a gif, ensuring that your profile remains clean and visually engaging.

With this tool, you can automatically analyze, generate, and display visuals for the following metrics:

- Repository breakdown by commits and lines of Python code

- Heatmap of commit activity by day and time

- Word cloud of commit messages

- File type distribution across repositories

- Libraries used in each repository

- Construct counts (including loops, classes, control flow statements, async functions, etc.)

- Highlights of the most recent closed PRs and commits

By implementing these automated insights, your profile stays up-to-date with real-time data, giving visitors a dynamic view of your work without any manual effort.

---

Target Audience:

This tool is designed for Python developers and GitHub users who want to showcase their project activity, code structure, and commit history visually on their profile. It’s ideal for those who value continuous profile enhancement with minimal maintenance, making it useful for developers focused on building a robust GitHub presence or professionals looking to highlight their coding activity to potential collaborators or employers.

---

Comparison:

I havnt seen other tools like this, but by using GitHub Actions, this project ensures that new data is gathered and appended automatically, including in-depth insights such as commit activity heatmaps, word clouds, and code construct counts. This makes it more comprehensive and effortless to maintain than alternatives that require additional steps or only offer limited metrics.

Repo:

https://github.com/sockheadrps/PyProfileDataGen

Example:

https://github.com/sockheadrps

Youtube Tutorial:

https://youtu.be/Ls7sTjXEMiI

18 Comments
2024/11/14
13:27 UTC

5

Python Project Recommendations to Search for Flights in a Specific Time Range

Hello, fellow Python enthusiasts!

I am interested in exploring Python projects that can search for and identify the best flight options within a specified date range, such as a particular month like April 2024 or a broader range. This type of feature was once handled efficiently by services like Skyscnnr and I would love to find Python tools or open-source projects capable of similar functionality today.

If you know of any relevant resources, projects, or libraries, I’d greatly appreciate your suggestions!

Many thanks in advance for your input and help!

1 Comment
2024/11/14
06:48 UTC

2

Thursday Daily Thread: Python Careers, Courses, and Furthering Education!

Weekly Thread: Professional Use, Jobs, and Education 🏢

Welcome to this week's discussion on Python in the professional world! This is your spot to talk about job hunting, career growth, and educational resources in Python. Please note, this thread is not for recruitment.


How it Works:

  1. Career Talk: Discuss using Python in your job, or the job market for Python roles.
  2. Education Q&A: Ask or answer questions about Python courses, certifications, and educational resources.
  3. Workplace Chat: Share your experiences, challenges, or success stories about using Python professionally.

Guidelines:

  • This thread is not for recruitment. For job postings, please see r/PythonJobs or the recruitment thread in the sidebar.
  • Keep discussions relevant to Python in the professional and educational context.

Example Topics:

  1. Career Paths: What kinds of roles are out there for Python developers?
  2. Certifications: Are Python certifications worth it?
  3. Course Recommendations: Any good advanced Python courses to recommend?
  4. Workplace Tools: What Python libraries are indispensable in your professional work?
  5. Interview Tips: What types of Python questions are commonly asked in interviews?

Let's help each other grow in our careers and education. Happy discussing! 🌟

0 Comments
2024/11/14
00:00 UTC

46

PyPIM is a new method to execute Python code directly in RAM

https://www.techspot.com/news/105557-pypim-new-method-execute-python-code-directly-ram.html

Performance can be significantly improved when the CPU is not involved

21 Comments
2024/11/13
22:32 UTC

0

Project Ideas needed for mathematics major.

This is for a mathematics project that is due next Monday.

I am an undergraduate student in India majoring in mathematics. My professor asked me to present a mathematical solution in form of either a project or a paper.

Now I know I am not going to end up with a paper and I don't even have the time for that left.

The project was due next month but, you see now I need to do it all in a weekend.

My core interests are in data science and AI but I am quite open for projects in Business simulation, Optimization and Finance (professor's core subjects)

Project Ideas that I had ChatGPTed or figured out myself:

  1. Performing a Network Analysis on Delhi Metro and finding the shortest routes using networkx (This is the one I was currently doing)

  2. Deploying Trade strategies using Stochastic calculus and employing trade indicators on historical data (AKA technical analysis) (Abandoned project from last semester)

  3. Creating a cli based Computer Algebra System/Mathematics language that takes up commands and gives back outputs:

calculus integrate y:=sin(x) with respect to x
plot y^2 == 4x```

I know the third one is silly because many advance tools exist and this will never be able to reach that level of complexity.

I need you all to figure out how I choose a project idea ... 
Any other project idea is also welcomed (primarily from mathematics, data science, machine learning and Finance)
3 Comments
2024/11/13
22:06 UTC

5

[AXM] A simple "Assembly"-like interpreter

What My Project Does

Over the past week, I have been developing an assembly-like interpreter for my custom language, which I call AXM. AXM is intended to resemble assembly language, but with a slightly more accessible syntax. Although the interpreter is currently written in Python and still in its early stages, it serves as a "toy" interpreter to test out language design concepts.

Target Audience

This project is primarily a toy rather than a production-ready tool. It’s not designed for practical applications but rather for exploration and learning. The syntax is heavily inspired by assembly languages but is simplified to make it a bit easier to work with. Anyone interested in language development or assembly-like languages might find it interesting to explore.

Comparison

AXM is distinct from existing assembly languages because it focuses more on accessibility and is designed to be relatively simple, rather than optimized for performance or real-world use. Unlike traditional assembly, AXM is an interpreted language, allowing users to run code directly without needing to compile it. While there are other interpreters for assembly-inspired languages, AXM aims to balance simplicity with the principles of low-level programming, making it somewhat unique.

Any feedback is greatly appreciated! I’d love to hear thoughts on its potential and any suggestions for improvements.

https://github.com/KuriWasTaken/AXM

Edit: I know the code is very badly formatted and I should add more comments, I will fix this

0 Comments
2024/11/13
19:17 UTC

65

Flask 3.1.0 Released

https://flask.palletsprojects.com/en/stable/changes/#version-3-1-0

  • Drop support for Python 3.8. #5623
  • Update minimum dependency versions to latest feature releases. Werkzeug >= 3.1, ItsDangerous >= 2.2, Blinker >= 1.9. #5624,5633
  • Provide a configuration option to control automatic option responses. #5496
  • Flask.open_resource/open_instance_resource and Blueprint.open_resource take an encoding parameter to use when opening in text mode. It defaults to utf-8. #5504
  • Request.max_content_length can be customized per-request instead of only through the MAX_CONTENT_LENGTH config. Added MAX_FORM_MEMORY_SIZE and MAX_FORM_PARTS config. Added documentation about resource limits to the security page. #5625
  • Add support for the Partitioned cookie attribute (CHIPS), with the SESSION_COOKIE_PARTITIONED config. #5472
  • -e path takes precedence over default .env and .flaskenv files. load_dotenv loads default files in addition to a path unless load_defaults=False is passed. #5628
  • Support key rotation with the SECRET_KEY_FALLBACKS config, a list of old secret keys that can still be used for unsigning. Extensions will need to add support. #5621
  • Fix how setting host_matching=True or subdomain_matching=False interacts with SERVER_NAME. Setting SERVER_NAME no longer restricts requests to only that domain. #5553
  • Request.trusted_hosts is checked during routing, and can be set through the TRUSTED_HOSTS config. #5636
1 Comment
2024/11/13
18:35 UTC

44

extractous - fast data extraction with a rust core + tika native libs compiled through graalvm

Hello r/Python!

Thought I'd share extractous, a new document extraction library that processes documents up to 20x faster than existing solutions.

What The Project Does

Extractous is a high-performance document extraction library that processes PDFs, Word documents, HTML, and many other formats with native speed. It's built with a Rust core and uses GraalVM to compile Tika components to native code, eliminating the need for external services or JVM runtime.

Performance

  • Extracted Apple's 10-K filing in 320ms vs unstructured-io's 8.2s

  • Average 18x faster across SEC filings dataset

  • Significantly lower memory footprint

Quick Start

pip install extractous

from extractous import Extractor

extractor = Extractor()
result = extractor.extract_file_to_string("document.pdf")
print(result)

Target Audience

  • Anyone using tika-python or unstructured-io who needs better performance
  • Large-scale document processing
  • RAG (Retrieval Augmented Generation) pipelines
  • AI/ML document preprocessing

Comparison

  • tika-python - Popular Apache Tika binding. Extractous offers native performance without JVM overhead
  • unstructured-io - Popular document processing library. Extractous is 18x faster and uses significantly less memory
  • textract - Extractous provides similar functionality but with native speed and modern architecture

Features

  • Support for numerous formats (PDF, Word, HTML, Images with OCR, etc.)
  • Simple Python API
  • No external API services or JVM required
  • Free for commercial use (Apache 2.0)
  • Memory efficient through Rust ownership model

Coming Soon

  • XHTML output support

  • Enhanced file metadata extraction

  • GIL-bypassing batch processing API for parallel workloads

Repo
https://github.com/yobix-ai/extractous

Try it online (free)
https://www.extractous.com/

12 Comments
2024/11/13
17:22 UTC

0

What is the better programme to Learn

Which is better, Visual Studio Code or Cycharm?

In terms of tools and ease of use, I currently use Cycharm, but I find it difficult to organise files.

7 Comments
2024/11/13
17:20 UTC

364

uv after 0.5.0 - might be worth replacing Poetry/pyenv/pipx

uv is rapidly maturing as an open-source tool for Python project management, reaching a full-featured capabilities with recent versions 0.4.27 and 0.5.0, making it a strong alternative to Poetry, pyenv, and pipx. However, concerns exist over its long-term stability and licensing, given Astral's venture funding position.

https://open.substack.com/pub/martynassubonis/p/python-project-management-primer-a55

120 Comments
2024/11/13
16:39 UTC

Back To Top