Interesting article mostly about sharing and visualizing ML processes.
Article:
python.plainenglish.io/5-python-libraries-every-data-scientist-should-know-about-ce04bf19d58d
This is an article about 5 Python libraries that data scientists should know. It discusses what libraries are useful after a data scientist has chosen a machine learning library and mastered choosing the right architecture for their model. The article lists and explains 5 libraries: MLflow, Streamlit, FastAPI, XGBoost, and ELI5. Each library is introduced with a brief explanation of its purpose and benefits. The author concludes by listing the advantages of knowing these libraries. Some of the important points from this article are that these libraries can make a data scientist more competitive, help them build full-stack projects, and make their models more interpretable.
The five projects given are:
There is a new edition of the “classic” text for reinforcement learning and it is freely available.
http://www.incompleteideas.net/book/RLbook2020trimmed.pdF
Googling “extract text from webpage using python” will get you a huge number of articles explaining how to use Requests and BeautifulSoup to automate text extraction from webpages. Almost all of these articles will produce terrible output that requires a lot of cleaning. Some do some elementary filtering on the DOM to exclude some text but very few do any sort of careful filtering to return only the main content on the page and will return headers and sidebars plus footer information.
For most purposes this is not text you want to scrape. I used to used jusText (GitHub fork and Original) but have recently come across another more complete solution, Trafilatura.
jusText is only a html to text converter. It will extract text that can then be saved. To scrape a website you will next to use requests or selenium. Trafilatura is a crawler and extractor with multiple output formats.
Here is a youtube video introduction (there is no voice so you can mute the annoying music):
Welcome to your brand new blog at University of Memphis Blogs.
To get started, simply edit or delete this post and check out all the other options available to you.
For assistance, visit our comprehensive support site, check out our Edublogs User Guide guide or stop by The Edublogs Forums to chat with other edubloggers.
You can also subscribe to our brilliant free publication, The Edublogger, which is jammed with helpful tips, ideas and more.