Intellectual Wanderings | Alistair Windsor: Mathematician, Educator, and sometimes Mathematical Educator

21
Feb
2024

Python Libraries for Data Scientists

Interesting article mostly about sharing and visualizing ML processes.

Article:

python.plainenglish.io/5-python-libraries-every-data-scientist-should-know-about-ce04bf19d58d

This is an article about 5 Python libraries that data scientists should know. It discusses what libraries are useful after a data scientist has chosen a machine learning library and mastered choosing the right architecture for their model. The article lists and explains 5 libraries: MLflow, Streamlit, FastAPI, XGBoost, and ELI5. Each library is introduced with a brief explanation of its purpose and benefits. The author concludes by listing the advantages of knowing these libraries. Some of the important points from this article are that these libraries can make a data scientist more competitive, help them build full-stack projects, and make their models more interpretable.

The five projects given are:

16
Feb
2023

Reinforcement Learning

categories: Uncategorized

There is a new edition of the “classic” text for reinforcement learning and it is freely available.

http://www.incompleteideas.net/book/RLbook2020trimmed.pdF

14
Nov
2022

Python Options for Converting Html to Text

categories: Uncategorized

Googling “extract text from webpage using python” will get you a huge number of articles explaining how to use Requests and BeautifulSoup to automate text extraction from webpages. Almost all of these articles will produce terrible output that requires a lot of cleaning. Some do some elementary filtering on the DOM to exclude some text but very few do any sort of careful filtering to return only the main content on the page and will return headers and sidebars plus footer information.

For most purposes this is not text you want to scrape. I used to used jusText (GitHub fork and Original) but have recently come across another more complete solution, Trafilatura.

jusText is only a html to text converter. It will extract text that can then be saved. To scrape a website you will next to use requests or selenium. Trafilatura is a crawler and extractor with multiple output formats.

Here is a youtube video introduction (there is no voice so you can mute the annoying music):

14
Nov
2022

Hello world!

categories: Uncategorized

Welcome to your brand new blog at University of Memphis Blogs.

To get started, simply edit or delete this post and check out all the other options available to you.

For assistance, visit our comprehensive support site, check out our Edublogs User Guide guide or stop by The Edublogs Forums to chat with other edubloggers.

You can also subscribe to our brilliant free publication, The Edublogger, which is jammed with helpful tips, ideas and more.