IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    scikit-learn文本分类

    coder4发表于 2023-10-09 04:41:36
    love 0
    import numpy as np import pandas as pd df = pd.read_csv('./smsspamcollection.tsv', sep='\t') df.head() df['label'].value_counts() # split data set from sklearn.model_selection import train_test_split X = df['message'] y = df['label'] df.dropna(inplace=True) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3) # pipeline from sklearn.pipeline import Pipeline from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.svm import LinearSVC text_clf = Pipeline([('tfidf', TfidfVectorizer()), […]


沪ICP备19023445号-2号
友情链接