博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
[Spark][Python]Spark 访问 mysql , 生成 dataframe 的例子:
阅读量:5826 次
发布时间:2019-06-18

本文共 7059 字,大约阅读时间需要 23 分钟。

[Spark][Python]Spark 访问 mysql , 生成 dataframe 的例子:

mydf001=sqlContext.read.format("jdbc").option("url","jdbc:mysql://localhost/loudacre")\

.option("dbtable","accounts").option("user","training").option("password","training").load()

 

In [10]: mydf001=sqlContext.read.format("jdbc").option("url","jdbc:mysql://localhost/loudacre")\

....: .option("dbtable","accounts").option("user","training").option("password","training").load()
17/10/03 05:59:53 INFO hive.HiveContext: default warehouse location is /user/hive/warehouse
17/10/03 05:59:53 INFO hive.HiveContext: Initializing metastore client version 1.1.0 using Spark classes.
17/10/03 05:59:53 INFO client.ClientWrapper: Inspected Hadoop version: 2.6.0-cdh5.7.0
17/10/03 05:59:53 INFO client.ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0-cdh5.7.0
17/10/03 05:59:56 INFO hive.metastore: Trying to connect to metastore with URI thrift://localhost.localdomain:9083
17/10/03 05:59:56 INFO hive.metastore: Opened a connection to metastore, current connections: 1
17/10/03 05:59:56 INFO hive.metastore: Connected to metastore.
17/10/03 05:59:56 INFO session.SessionState: Created local directory: /tmp/c2d22d09-7425-4bb3-94c3-39cb32267c7d_resources
17/10/03 05:59:56 INFO session.SessionState: Created HDFS directory: /tmp/hive/training/c2d22d09-7425-4bb3-94c3-39cb32267c7d
17/10/03 05:59:56 INFO session.SessionState: Created local directory: /tmp/training/c2d22d09-7425-4bb3-94c3-39cb32267c7d
17/10/03 05:59:56 INFO session.SessionState: Created HDFS directory: /tmp/hive/training/c2d22d09-7425-4bb3-94c3-39cb32267c7d/_tmp_space.db
17/10/03 05:59:56 INFO session.SessionState: No Tez session required at this point. hive.execution.engine=mr.

In [11]:

In [11]: type(mydf001)
Out[11]: pyspark.sql.dataframe.DataFrame

In [12]: mydf001.count()

17/10/03 06:00:29 INFO spark.SparkContext: Starting job: count at NativeMethodAccessorImpl.java:-2
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Registering RDD 2 (count at NativeMethodAccessorImpl.java:-2)
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Got job 0 (count at NativeMethodAccessorImpl.java:-2) with 1 output partitions
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (count at NativeMethodAccessorImpl.java:-2)
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0)
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[2] at count at NativeMethodAccessorImpl.java:-2), which has no missing parents
17/10/03 06:00:30 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 11.0 KB, free 11.0 KB)
17/10/03 06:00:31 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 5.2 KB, free 16.1 KB)
17/10/03 06:00:31 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:36793 (size: 5.2 KB, free: 208.8 MB)
17/10/03 06:00:31 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
17/10/03 06:00:31 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[2] at count at NativeMethodAccessorImpl.java:-2)
17/10/03 06:00:31 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
17/10/03 06:00:31 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 1911 bytes)
17/10/03 06:00:31 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
17/10/03 06:00:32 INFO codegen.GenerateMutableProjection: Code generated in 425.82589 ms
17/10/03 06:00:32 INFO codegen.GenerateUnsafeProjection: Code generated in 78.278589 ms
17/10/03 06:00:33 INFO codegen.GenerateMutableProjection: Code generated in 84.676206 ms
17/10/03 06:00:33 INFO codegen.GenerateUnsafeRowJoiner: Code generated in 60.144399 ms
17/10/03 06:00:33 INFO codegen.GenerateUnsafeProjection: Code generated in 95.977074 ms
17/10/03 06:00:34 INFO jdbc.JDBCRDD: closed connection
17/10/03 06:00:34 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1334 bytes result sent to driver
17/10/03 06:00:34 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 3081 ms on localhost (1/1)
17/10/03 06:00:34 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
17/10/03 06:00:34 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (count at NativeMethodAccessorImpl.java:-2) finished in 3.163 s
17/10/03 06:00:34 INFO scheduler.DAGScheduler: looking for newly runnable stages
17/10/03 06:00:34 INFO scheduler.DAGScheduler: running: Set()
17/10/03 06:00:34 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1)
17/10/03 06:00:34 INFO scheduler.DAGScheduler: failed: Set()
17/10/03 06:00:34 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[5] at count at NativeMethodAccessorImpl.java:-2), which has no missing parents
17/10/03 06:00:34 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 12.1 KB, free 28.3 KB)
17/10/03 06:00:34 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 5.6 KB, free 33.9 KB)
17/10/03 06:00:34 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:36793 (size: 5.6 KB, free: 208.8 MB)
17/10/03 06:00:34 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
17/10/03 06:00:34 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[5] at count at NativeMethodAccessorImpl.java:-2)
17/10/03 06:00:34 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
17/10/03 06:00:34 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, partition 0,NODE_LOCAL, 1999 bytes)
17/10/03 06:00:34 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 1)
17/10/03 06:00:34 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
17/10/03 06:00:34 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 32 ms
17/10/03 06:00:35 INFO codegen.GenerateMutableProjection: Code generated in 52.636353 ms
17/10/03 06:00:35 INFO codegen.GenerateMutableProjection: Code generated in 49.757505 ms
17/10/03 06:00:35 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 1). 1666 bytes result sent to driver
17/10/03 06:00:35 INFO scheduler.DAGScheduler: ResultStage 1 (count at NativeMethodAccessorImpl.java:-2) finished in 0.795 s
17/10/03 06:00:35 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 789 ms on localhost (1/1)
17/10/03 06:00:35 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
17/10/03 06:00:35 INFO scheduler.DAGScheduler: Job 0 finished: count at NativeMethodAccessorImpl.java:-2, took 6.451521 s
Out[12]: 129761

In [13]:

本文转自健哥的数据花园博客园博客,原文链接:http://www.cnblogs.com/gaojian/p/7624493.html,如需转载请自行联系原作者

你可能感兴趣的文章
Go开发之路(目录)
查看>>
RHEL6.5安装成功ORACLE11GR2之后,编写PROC程序出错解决方法
查看>>
(50)与magento集成
查看>>
Ubuntu设置python3为默认版本
查看>>
JsonCpp 的使用
查看>>
问题账户需求分析
查看>>
JavaSE-代码块
查看>>
爬取所有校园新闻
查看>>
32、SpringBoot-整合Dubbo
查看>>
python面向对象基础
查看>>
HDU 2044 一只小蜜蜂(递归)
查看>>
docker 下 安装rancher 笔记
查看>>
spring两大核心对象IOC和AOP(新手理解)
查看>>
数据分析相关
查看>>
Python LDAP中的时间戳转换为Linux下时间
查看>>
微信小程序蓝牙连接小票打印机
查看>>
环境错误2
查看>>
C++_了解虚函数的概念
查看>>
全新jmeter视频已经上架
查看>>
Windows 8下如何删除无线配置文件
查看>>