<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>布兰特 | 不忘初心</title>
  
  <subtitle>人处在一种默默奋斗的状态，精神就会从琐碎生活中得到升华</subtitle>
  <link href="/atom.xml" rel="self"/>
  
  <link href="cpeixin.cn/"/>
  <updated>2021-01-12T15:55:54.038Z</updated>
  <id>cpeixin.cn/</id>
  
  <author>
    <name>Brent</name>
    
  </author>
  
  <generator uri="https://hexo.io/">Hexo</generator>
  
  <entry>
    <title>Hive Cli 启动卡死的问题</title>
    <link href="cpeixin.cn/2021/01/12/Hive-Cli-%E5%90%AF%E5%8A%A8%E5%8D%A1%E6%AD%BB%E7%9A%84%E9%97%AE%E9%A2%98/"/>
    <id>cpeixin.cn/2021/01/12/Hive-Cli-%E5%90%AF%E5%8A%A8%E5%8D%A1%E6%AD%BB%E7%9A%84%E9%97%AE%E9%A2%98/</id>
    <published>2021-01-12T15:54:22.000Z</published>
    <updated>2021-01-12T15:55:54.038Z</updated>
    
    <content type="html"><![CDATA[<!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --><p>问题出现的背景<br><img src="https://cdn.nlark.com/yuque/0/2021/png/1072113/1610466021704-a9182060-27fd-4c09-83a3-ff899f27f74c.png#align=left&display=inline&height=408&margin=%5Bobject%20Object%5D&name=%E6%88%AA%E5%B1%8F2021-01-12%20%E4%B8%8B%E5%8D%8811.38.44.png&originHeight=408&originWidth=1098&size=182541&status=done&style=none&width=1098" alt="截屏2021-01-12 下午11.38.44.png"></p><p>目前在使用Hive查数据的时候，同事们更多的是选择Hue平台，但是也会有熟悉Hive的同事直接使用Hive Cli来完成数据的查询，来看具体的Running进度，但是遇到这种Hive Cli卡住的情况，我也是第一次遇到。</p><p>首先想到的则是 Hive 服务是否还健康，去Ambari看了下Hive服务，是正常的，并没有报警。</p><p>这里提一下，集群Hive中，计算引擎使用的是Tez，启动过程中，Tez会向yarn中申请资源。如果资源不足那么就会导致这种问题！</p><p>这时候看了下Yarn的使用情况</p><p><img src="https://cdn.nlark.com/yuque/0/2021/png/1072113/1610466580341-e39487be-e0db-4b8f-812c-b4c6289c99ed.png#align=left&display=inline&height=157&margin=%5Bobject%20Object%5D&name=image.png&originHeight=314&originWidth=378&size=13720&status=done&style=none&width=189" alt="image.png"></p><p>看到这里其实问题就差不多清楚了，接下来验证一下，新开一个窗口</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">hive -hiveconf hive.execution.engine&#x3D;mr</span><br></pre></td></tr></table></figure><p>这样计算引擎临时切换到MapReduce，就没有问题了。所以原因在于如果hive的engine是Tez的话,Tez在执行之前会判断yarn资源是否充足,如果资源一直不足,则无法执行,而mr则不会有这种问题!</p><!-- rebuild by neat -->]]></content>
    
    <summary type="html">
    
      
      
        &lt;!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --&gt;&lt;p&gt;问题出现的背景&lt;br&gt;&lt;img src=&quot;https://cdn.nlark.com/yuque/0/2021/png/1072113/1610
      
    
    </summary>
    
    
      <category term="大数据" scheme="cpeixin.cn/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/"/>
    
    
      <category term="Hive" scheme="cpeixin.cn/tags/Hive/"/>
    
  </entry>
  
  <entry>
    <title>MapReduce源码解析(一)</title>
    <link href="cpeixin.cn/2020/12/06/MapReduce%E6%BA%90%E7%A0%81%E8%A7%A3%E6%9E%90-%E4%B8%80/"/>
    <id>cpeixin.cn/2020/12/06/MapReduce%E6%BA%90%E7%A0%81%E8%A7%A3%E6%9E%90-%E4%B8%80/</id>
    <published>2020-12-06T10:15:23.000Z</published>
    <updated>2020-12-06T10:17:27.118Z</updated>
    
    <content type="html"><![CDATA[<!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --><p>将Hadoop源码下载到本地后，直接用Vs Code打开，起初我是想通过IDEA，一边Debug，一边看源码的执行流程，但是Maven依赖总出问题，后来索性还是直接用Vs Code生撸吧，我选择的版本是Hadoop 2.9.2</p><h3 id="从提交任务开始"><a href="#从提交任务开始" class="headerlink" title="从提交任务开始"></a>从提交任务开始</h3><p>路径：hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/WordCount.java<br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1607242894835-22a2f6a2-a627-4e1b-9f5b-3805ece793da.png#align=left&display=inline&height=810&margin=%5Bobject%20Object%5D&name=%E6%88%AA%E5%B1%8F2020-12-06%20%E4%B8%8B%E5%8D%884.21.21.png&originHeight=810&originWidth=1194&size=249550&status=done&style=none&width=1194" alt="截屏2020-12-06 下午4.21.21.png"><br>Command + 左键， 点击光标选中的方法<br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1607243545223-29e5c933-4258-4b25-97e4-cc1cc5091990.png#align=left&display=inline&height=1104&margin=%5Bobject%20Object%5D&name=%E6%88%AA%E5%B1%8F2020-12-06%20%E4%B8%8B%E5%8D%884.32.08.png&originHeight=1104&originWidth=1332&size=270055&status=done&style=none&width=1332" alt="截屏2020-12-06 下午4.32.08.png"><br>接下来跳入<strong>submit**</strong>()方法**<br><strong><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1607244046301-c2b93f9a-3ada-4e99-a3fd-103d93d62c56.png#align=left&display=inline&height=830&margin=%5Bobject%20Object%5D&name=%E6%88%AA%E5%B1%8F2020-12-06%20%E4%B8%8B%E5%8D%884.40.35.png&originHeight=830&originWidth=1338&size=238978&status=done&style=none&width=1338" alt="截屏2020-12-06 下午4.40.35.png"></strong><br>**<br><strong>进入submitJobInternal()，这坨代码有点长，我们先看</strong>writeSplits()</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br><span class="line">128</span><br><span class="line">129</span><br><span class="line">130</span><br></pre></td><td class="code"><pre><span class="line">JobStatus submitJobInternal(Job job, Cluster cluster) &#x2F;&#x2F;向系统提交作业的内部方法。</span><br><span class="line">  throws ClassNotFoundException, InterruptedException, IOException &#123;</span><br><span class="line"></span><br><span class="line">    &#x2F;&#x2F;validate the jobs output specs，校验输出路径</span><br><span class="line">    checkSpecs(job);</span><br><span class="line"></span><br><span class="line">    Configuration conf &#x3D; job.getConfiguration(); &#x2F;&#x2F;获取Job配置信息</span><br><span class="line">    addMRFrameworkToDistributedCache(conf);&#x2F;&#x2F;配置信息添加到缓存</span><br><span class="line"></span><br><span class="line">    Path jobStagingArea &#x3D; JobSubmissionFiles.getStagingDir(cluster, conf);</span><br><span class="line">    &#x2F;&#x2F;configure the command line options correctly on the submitting dfs</span><br><span class="line">    InetAddress ip &#x3D; InetAddress.getLocalHost();</span><br><span class="line">    if (ip !&#x3D; null) &#123;</span><br><span class="line">      submitHostAddress &#x3D; ip.getHostAddress();</span><br><span class="line">      submitHostName &#x3D; ip.getHostName();</span><br><span class="line">      conf.set(MRJobConfig.JOB_SUBMITHOST,submitHostName);</span><br><span class="line">      conf.set(MRJobConfig.JOB_SUBMITHOSTADDR,submitHostAddress);</span><br><span class="line">    &#125;</span><br><span class="line">    JobID jobId &#x3D; submitClient.getNewJobID();</span><br><span class="line">    job.setJobID(jobId);</span><br><span class="line">    Path submitJobDir &#x3D; new Path(jobStagingArea, jobId.toString());</span><br><span class="line">    JobStatus status &#x3D; null;</span><br><span class="line">    try &#123;</span><br><span class="line">      conf.set(MRJobConfig.USER_NAME,</span><br><span class="line">          UserGroupInformation.getCurrentUser().getShortUserName());</span><br><span class="line">      conf.set(&quot;hadoop.http.filter.initializers&quot;, </span><br><span class="line">          &quot;org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer&quot;);</span><br><span class="line">      conf.set(MRJobConfig.MAPREDUCE_JOB_DIR, submitJobDir.toString());</span><br><span class="line">      LOG.debug(&quot;Configuring job &quot; + jobId + &quot; with &quot; + submitJobDir </span><br><span class="line">          + &quot; as the submit dir&quot;);</span><br><span class="line">      &#x2F;&#x2F; get delegation token for the dir</span><br><span class="line">      TokenCache.obtainTokensForNamenodes(job.getCredentials(),</span><br><span class="line">          new Path[] &#123; submitJobDir &#125;, conf);</span><br><span class="line">      </span><br><span class="line">      populateTokenCache(conf, job.getCredentials());</span><br><span class="line"></span><br><span class="line">      &#x2F;&#x2F; generate a secret to authenticate shuffle transfers</span><br><span class="line">      if (TokenCache.getShuffleSecretKey(job.getCredentials()) &#x3D;&#x3D; null) &#123;</span><br><span class="line">        KeyGenerator keyGen;</span><br><span class="line">        try &#123;</span><br><span class="line">          keyGen &#x3D; KeyGenerator.getInstance(SHUFFLE_KEYGEN_ALGORITHM);</span><br><span class="line">          keyGen.init(SHUFFLE_KEY_LENGTH);</span><br><span class="line">        &#125; catch (NoSuchAlgorithmException e) &#123;</span><br><span class="line">          throw new IOException(&quot;Error generating shuffle secret key&quot;, e);</span><br><span class="line">        &#125;</span><br><span class="line">        SecretKey shuffleKey &#x3D; keyGen.generateKey();</span><br><span class="line">        TokenCache.setShuffleSecretKey(shuffleKey.getEncoded(),</span><br><span class="line">            job.getCredentials());</span><br><span class="line">      &#125;</span><br><span class="line">      if (CryptoUtils.isEncryptedSpillEnabled(conf)) &#123;</span><br><span class="line">        conf.setInt(MRJobConfig.MR_AM_MAX_ATTEMPTS, 1);</span><br><span class="line">        LOG.warn(&quot;Max job attempts set to 1 since encrypted intermediate&quot; +</span><br><span class="line">                &quot;data spill is enabled&quot;);</span><br><span class="line">      &#125;</span><br><span class="line"></span><br><span class="line">      copyAndConfigureFiles(job, submitJobDir);&#x2F;&#x2F;将可执行文件之类拷贝到HDFS中，默认的是保留10份，会存在不同的节点上</span><br><span class="line"></span><br><span class="line">      Path submitJobFile &#x3D; JobSubmissionFiles.getJobConfPath(submitJobDir);</span><br><span class="line">      </span><br><span class="line">      &#x2F;&#x2F; Create the splits for the job</span><br><span class="line">      LOG.debug(&quot;Creating splits at &quot; + jtFs.makeQualified(submitJobDir));</span><br><span class="line">      &#x2F;&#x2F;计算map数量，所有的切片信息 提交到 submitJobDir 路径上</span><br><span class="line">      int maps &#x3D; writeSplits(job, submitJobDir);</span><br><span class="line">      conf.setInt(MRJobConfig.NUM_MAPS, maps);</span><br><span class="line">      LOG.info(&quot;number of splits:&quot; + maps);</span><br><span class="line"></span><br><span class="line">      int maxMaps &#x3D; conf.getInt(MRJobConfig.JOB_MAX_MAP,</span><br><span class="line">          MRJobConfig.DEFAULT_JOB_MAX_MAP);</span><br><span class="line">      if (maxMaps &gt;&#x3D; 0 &amp;&amp; maxMaps &lt; maps) &#123;</span><br><span class="line">        throw new IllegalArgumentException(&quot;The number of map tasks &quot; + maps +</span><br><span class="line">            &quot; exceeded limit &quot; + maxMaps);</span><br><span class="line">      &#125;</span><br><span class="line"></span><br><span class="line">      &#x2F;&#x2F; write &quot;queue admins of the queue to which job is being submitted&quot;</span><br><span class="line">      &#x2F;&#x2F; to job file.</span><br><span class="line">      String queue &#x3D; conf.get(MRJobConfig.QUEUE_NAME,</span><br><span class="line">          JobConf.DEFAULT_QUEUE_NAME);</span><br><span class="line">      AccessControlList acl &#x3D; submitClient.getQueueAdmins(queue);</span><br><span class="line">      conf.set(toFullPropertyName(queue,</span><br><span class="line">          QueueACL.ADMINISTER_JOBS.getAclName()), acl.getAclString());</span><br><span class="line"></span><br><span class="line">      &#x2F;&#x2F; removing jobtoken referrals before copying the jobconf to HDFS</span><br><span class="line">      &#x2F;&#x2F; as the tasks don&#39;t need this setting, actually they may break</span><br><span class="line">      &#x2F;&#x2F; because of it if present as the referral will point to a</span><br><span class="line">      &#x2F;&#x2F; different job.</span><br><span class="line">      TokenCache.cleanUpTokenReferral(conf);&#x2F;&#x2F;清除Token引用的缓存</span><br><span class="line"></span><br><span class="line">      if (conf.getBoolean(</span><br><span class="line">          MRJobConfig.JOB_TOKEN_TRACKING_IDS_ENABLED,</span><br><span class="line">          MRJobConfig.DEFAULT_JOB_TOKEN_TRACKING_IDS_ENABLED)) &#123;</span><br><span class="line">        &#x2F;&#x2F; Add HDFS tracking ids</span><br><span class="line">        ArrayList&lt;String&gt; trackingIds &#x3D; new ArrayList&lt;String&gt;();</span><br><span class="line">        for (Token&lt;? extends TokenIdentifier&gt; t :</span><br><span class="line">            job.getCredentials().getAllTokens()) &#123;</span><br><span class="line">          trackingIds.add(t.decodeIdentifier().getTrackingId());</span><br><span class="line">        &#125;</span><br><span class="line">        conf.setStrings(MRJobConfig.JOB_TOKEN_TRACKING_IDS,</span><br><span class="line">            trackingIds.toArray(new String[trackingIds.size()]));</span><br><span class="line">      &#125;</span><br><span class="line"></span><br><span class="line">      &#x2F;&#x2F; Set reservation info if it exists</span><br><span class="line">      ReservationId reservationId &#x3D; job.getReservationId();</span><br><span class="line">      if (reservationId !&#x3D; null) &#123;</span><br><span class="line">        conf.set(MRJobConfig.RESERVATION_ID, reservationId.toString());</span><br><span class="line">      &#125;</span><br><span class="line"></span><br><span class="line">      &#x2F;&#x2F; Write job file to submit dir</span><br><span class="line">      writeConf(conf, submitJobFile);</span><br><span class="line">      </span><br><span class="line">      &#x2F;&#x2F;</span><br><span class="line">      &#x2F;&#x2F; Now, actually submit the job (using the submit name)</span><br><span class="line">      &#x2F;&#x2F; 这里，开始正式的提交任务了。</span><br><span class="line">      &#x2F;&#x2F;</span><br><span class="line">      printTokens(jobId, job.getCredentials());</span><br><span class="line">      status &#x3D; submitClient.submitJob(</span><br><span class="line">          jobId, submitJobDir.toString(), job.getCredentials());</span><br><span class="line">      if (status !&#x3D; null) &#123;</span><br><span class="line">        return status;</span><br><span class="line">      &#125; else &#123;</span><br><span class="line">        throw new IOException(&quot;Could not launch job&quot;);</span><br><span class="line">      &#125;</span><br><span class="line">    &#125; finally &#123;</span><br><span class="line">      if (status &#x3D;&#x3D; null) &#123;</span><br><span class="line">        LOG.info(&quot;Cleaning up the staging area &quot; + submitJobDir);</span><br><span class="line">        if (jtFs !&#x3D; null &amp;&amp; submitJobDir !&#x3D; null)</span><br><span class="line">          jtFs.delete(submitJobDir, true);</span><br><span class="line"></span><br><span class="line">      &#125;</span><br><span class="line">    &#125;</span><br><span class="line">  &#125;</span><br></pre></td></tr></table></figure><p><strong>writeSplits()，</strong>该方法是对输入的数据进行切片处理并且返回map任务数<br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1607247032830-1bcbeb89-f237-406c-85c9-f041db06595b.png#align=left&display=inline&height=446&margin=%5Bobject%20Object%5D&name=%E6%88%AA%E5%B1%8F2020-12-06%20%E4%B8%8B%E5%8D%885.30.22.png&originHeight=446&originWidth=1128&size=107698&status=done&style=none&width=1128" alt="截屏2020-12-06 下午5.30.22.png"></p><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1607247085102-75854066-3ab2-46d8-939d-31c78fea9d5e.png#align=left&display=inline&height=668&margin=%5Bobject%20Object%5D&name=%E6%88%AA%E5%B1%8F2020-12-06%20%E4%B8%8B%E5%8D%885.31.15.png&originHeight=668&originWidth=1228&size=197338&status=done&style=none&width=1228" alt="截屏2020-12-06 下午5.31.15.png"><br><strong>跳进getSplit()</strong><br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1607247940507-543627f9-1265-4190-8b53-d9e31d3f45c6.png#align=left&display=inline&height=1940&margin=%5Bobject%20Object%5D&name=%E6%88%AA%E5%B1%8F2020-12-06%20%E4%B8%8B%E5%8D%885.45.29.png&originHeight=1940&originWidth=2010&size=685929&status=done&style=none&width=2010" alt="截屏2020-12-06 下午5.45.29.png"><br>在看上面的代码中，其中有一块是我的盲区，就是<strong>while (((double) bytesRemaining)/splitSize &gt; SPLIT_SLOP)</strong><br>**<br>其中<strong>SPLIT_SLOP</strong>默认是1.1，这里所实现的功能就是，在一个文件不断的split的时候，如果当前的待切割容量/blockSize &gt; 1.1，则会继续进行划分，如果小于1.1，则可以不用再继续split了，举例讲的话，也就是待划分大小不大于140.8M的话，就不用切割了。</p><p>现在跳出这个方法。继续看<strong>writeNewSplits()，这里只是对文件进行了区间划分，并没有进行实际的物理切割，那接下来经过一次按照分片大小的排序后，</strong>它将切片数据写入切片文件，并得到切片元数据信息SplitMetaInfo数组info，最后返回map数。一个 Job 的Map阶段并行度由客户端在提交Job时的切片数决定（有多少个切片就有多少个 MapTask）</p><p>继续跳回submitJobInternal()<br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1607249258725-0335a2f7-da4e-426b-be2d-4f71a573ca33.png#align=left&display=inline&height=902&margin=%5Bobject%20Object%5D&name=%E6%88%AA%E5%B1%8F2020-12-06%20%E4%B8%8B%E5%8D%886.07.23.png&originHeight=902&originWidth=1118&size=155252&status=done&style=none&width=1118" alt="截屏2020-12-06 下午6.07.23.png"><br>submitJob()将任务提交到集群</p><!-- rebuild by neat -->]]></content>
    
    <summary type="html">
    
      
      
        &lt;!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --&gt;&lt;p&gt;将Hadoop源码下载到本地后，直接用Vs Code打开，起初我是想通过IDEA，一边Debug，一边看源码的执行流程，但是Maven依赖总出问
      
    
    </summary>
    
    
      <category term="源码系列" scheme="cpeixin.cn/categories/%E6%BA%90%E7%A0%81%E7%B3%BB%E5%88%97/"/>
    
    
      <category term="mapreduce" scheme="cpeixin.cn/tags/mapreduce/"/>
    
  </entry>
  
  <entry>
    <title>大数据集群新模式-存算分离</title>
    <link href="cpeixin.cn/2020/11/29/%E5%A4%A7%E6%95%B0%E6%8D%AE%E9%9B%86%E7%BE%A4%E6%96%B0%E6%A8%A1%E5%BC%8F-%E5%AD%98%E7%AE%97%E5%88%86%E7%A6%BB/"/>
    <id>cpeixin.cn/2020/11/29/%E5%A4%A7%E6%95%B0%E6%8D%AE%E9%9B%86%E7%BE%A4%E6%96%B0%E6%A8%A1%E5%BC%8F-%E5%AD%98%E7%AE%97%E5%88%86%E7%A6%BB/</id>
    <published>2020-11-29T15:02:43.000Z</published>
    <updated>2020-11-29T15:04:33.591Z</updated>
    
    <content type="html"><![CDATA[<!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --><p>最近公司大数据集群要迁移到腾讯云，由原来的HDP版本+物理机的模式，要切换到EMR+COS（对象存储）存算分离模式，那么存算分离到底是什么呢？</p><p>Hadoop一出生就是存储与计算在一起的，前几年面试题中都问，Hadoop怎么保证高性能呢？其中一个原因是数据不动，计算(code)动。如今，到了计算与存储分离的阶段。<strong>存储计算分离是一种分层架构思想，即将存储能力和计算能力分开，各自服务化，通过高速网络连接</strong><br>**</p><blockquote><p>对象存储（Cloud Object Storage，COS）是由腾讯云推出的无目录层次结构、无数据格式限制，可容纳海量数据且支持 HTTP/HTTPS 协议访问的分布式存储服务。腾讯云 COS 的存储桶空间无容量上限，无需分区管理，适用于 CDN 数据分发、数据万象处理或大数据计算与分析的数据湖等多种场景。COS 提供网页端管理界面、多种主流开发语言的 SDK、API 以及命令行和图形化工具，并且兼容 S3 的 API 接口，方便用户直接使用社区工具和插件。其底层采用腾讯自研存储引擎YottaStore，能够实现单集群理论管理百万级节点，并且做到真正的按需扩容，磁盘利用率达到 90% 以上</p></blockquote><h2 id="为什么要存算分离"><a href="#为什么要存算分离" class="headerlink" title="为什么要存算分离"></a>为什么要存算分离</h2><p>以Hadoop为例说明，在传统Hadoop的使用中，存储与计算密不可分，而随着业务的发展，常常会因为为了扩存储而带来额外的计算扩容，这其实就是一种浪费；同理，只为了提升计算能力，也会带来一段时期的存储浪费，将计算和存储分离，可以更好地应对单方面的不足。</p><h2 id="存储与计算分离的趋势"><a href="#存储与计算分离的趋势" class="headerlink" title="存储与计算分离的趋势"></a>存储与计算分离的趋势</h2><p>在2009年做大规模计算的核心词是“Locality”：让计算尽量靠近数据以提升效率。当时一个公认的模型是：构建一个足够大的资源池，把数据和计算融合在里面发挥规模效应。</p><p>但最近几年以来，生态和环境都悄然发生了一些变化：</p><ul><li>计算模式：全量数据计算模式，逐步被Impala、Presto等更高效计算模式赶上</li><li>存储格式：ORC/Parquet/Kudu等列存、索引技术诞生，使得计算不需要Scan全量数据</li><li>网络架构：25G网络开始上线，FPGA等技术也加快了网络体验</li><li>存储介质：SSD、AliFlash、3D X-Point 大量混合技术使得存储可以“既快又猛”</li><li>计算平台：GPU、FGPA、甚至是未来的TPU等改变计算形态</li></ul><p>从这些变化使得我们发现：<br>通过一款机型通吃存储+计算方案，已经演变成存储+计算各自服务化，通过高速网络进行连接的趋势<br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1606655257980-b9e9df62-80b4-4b0c-b1e5-f232ae65af0b.png#align=left&display=inline&height=176&margin=%5Bobject%20Object%5D&name=image.png&originHeight=351&originWidth=765&size=46106&status=done&style=none&width=382.5" alt="image.png"><br>这种方式可以使得存储、计算不用再被”机型“，”机柜“，”电力“等方案束缚，在各自最擅长的领域进行创新。</p><blockquote><p>注：关于25G网络，请看 <a href="https://cloud.tencent.com/developer/news/235145" target="_blank" rel="external nofollow noopener noreferrer">https://cloud.tencent.com/developer/news/235145</a></p></blockquote><h2 id="基本架构"><a href="#基本架构" class="headerlink" title="基本架构"></a>基本架构</h2><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1606642499936-3a590b55-be4f-45c6-bb06-802807f7a41d.png#align=left&display=inline&height=109&margin=%5Bobject%20Object%5D&originHeight=109&originWidth=408&size=0&status=done&style=none&width=408" alt><br>架构其实比较简单，OSS作为默认的存储（对象存储），Hadoop、Spark可以作为计算引擎直接分析OSS存储的数据。<br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1606642499939-2a502ebb-58c0-4ba4-9af6-7ade26e915c4.png#align=left&display=inline&height=246&margin=%5Bobject%20Object%5D&originHeight=246&originWidth=366&size=0&status=done&style=none&width=366" alt><br>以上比较了计算与存储分离的优缺点。<br>**<br><strong>灵活：</strong>在《E-MapReduce(Hadoop)10大类问题之集群规划》 一文中分析了集群规划问题，关键是匹配计算量与存储量，如果把计算与存储分离后，则集群规划则变得简单很多，<strong>基本不需要估算未来业务的规模了，真正做到按需使用。</strong><br>**<br><strong>成本：</strong>存储与计算分离后,按照1 master 8cpu32g 6 slave 8cpu32g 10T数据量，大致成本下降一倍，在ecs自建的磁盘选择高效云盘。<br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1606642499968-0f873159-d172-4561-9813-6831ac6651fd.png#align=left&display=inline&height=324&margin=%5Bobject%20Object%5D&originHeight=324&originWidth=491&size=0&status=done&style=none&width=491" alt><br>**<br>**<br><strong>性能：</strong>随着EMR和OSS的兼容做的越来越好和云计算网络环境的提升，OSS作为存储会越来越多得体现出其优势。对比两种方案，OSS方案最大的缺点是在读数据的时候性能低于HDFS，可以把数据的最初读取和最终的结果使用OSS，中间的临时计算变量放到HDFS上，可以一定程度上缓解这个问题。当然具体使用什么方案还需要根据实际情况选择最合适自己的才是最好的方案。性能大约下降10%以内，对于一般的应用是可以接受的</p><h2 id="分析"><a href="#分析" class="headerlink" title="分析"></a>分析</h2><p>我们可以看到，emr+oss后，成本节约了一半，但是性能下降基本可以忽略不计。从性能图上看，emr+oss对比ecs自建hadoop对比：<br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1606642755272-3d878e37-9234-421c-ae6a-c8f23a4cb0ed.png#align=left&display=inline&height=187&margin=%5Bobject%20Object%5D&originHeight=187&originWidth=844&size=0&status=done&style=none&width=844" alt></p><p>也就是整体来讲，emr+oss比自建使用更少的资源，如果提高emr+oss的并发度，则时间上有可能超过ecs自建hadoop集群的。</p><!-- rebuild by neat -->]]></content>
    
    <summary type="html">
    
      
      
        &lt;!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --&gt;&lt;p&gt;最近公司大数据集群要迁移到腾讯云，由原来的HDP版本+物理机的模式，要切换到EMR+COS（对象存储）存算分离模式，那么存算分离到底是什么呢？&lt;
      
    
    </summary>
    
    
      <category term="大数据" scheme="cpeixin.cn/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/"/>
    
    
      <category term="COS" scheme="cpeixin.cn/tags/COS/"/>
    
  </entry>
  
  <entry>
    <title>Flink消费Kafka以及参数设置</title>
    <link href="cpeixin.cn/2020/11/21/Flink%E6%B6%88%E8%B4%B9Kafka%E4%BB%A5%E5%8F%8A%E5%8F%82%E6%95%B0%E8%AE%BE%E7%BD%AE/"/>
    <id>cpeixin.cn/2020/11/21/Flink%E6%B6%88%E8%B4%B9Kafka%E4%BB%A5%E5%8F%8A%E5%8F%82%E6%95%B0%E8%AE%BE%E7%BD%AE/</id>
    <published>2020-11-21T10:25:43.000Z</published>
    <updated>2020-11-21T10:39:46.895Z</updated>
    
    <content type="html"><![CDATA[<!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --><p>在实时计算的场景下，绝大多数的数据源都是消息系统，而 Kafka 从众多的消息中间件中脱颖而出，主要是因为<strong>高吞吐</strong>、<strong>低延迟</strong>的特点；同时也讲了 Flink 作为生产者像 Kafka 写入数据的方式和代码实现。将从以下几个方面介绍 Flink 消费 Kafka 中的数据方式和源码实现。</p><h3 id="Kafka-连接-Flink"><a href="#Kafka-连接-Flink" class="headerlink" title="Kafka 连接 Flink"></a>Kafka 连接 Flink</h3><p>Flink 中支持了比较丰富的用来连接第三方的连接器，Kafka Connector 是 Flink 支持的各种各样的连接器中比较完善的之一。</p><p>Flink 提供了专门的 Kafka 连接器，向 Kafka Topic 中读取或者写入数据。Flink Kafka Consumer 集成了 Flink 的 Checkpoint 机制，可提供 exactly-once 的处理语义。为此，Flink 并不完全依赖于跟踪 Kafka 消费组的偏移量，而是在内部跟踪和检查偏移量。</p><p>同时也提过，我们在使用 Kafka 连接器时需要引用相对应的 Jar 包依赖。对于某些连接器比如 Kafka 是有版本要求的，一定要去官方网站找到对应的依赖版本。我在下表中给出了不同版本的 Kafka，以及对应的 Connector 关系：<br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1605948621977-67935dc7-b002-4650-9256-b6298b68c5ce.png#align=left&display=inline&height=696&margin=%5Bobject%20Object%5D&name=image.png&originHeight=1392&originWidth=1514&size=276244&status=done&style=none&width=757" alt="image.png"></p><h4 id="Kafka-本地环境搭建"><a href="#Kafka-本地环境搭建" class="headerlink" title="Kafka 本地环境搭建"></a>Kafka 本地环境搭建</h4><p>我们在本地环境搭建一个 Kafka_2.11-2.1.0 版本的 Kafka 单机环境，然后模拟一些数据写入到队列中。</p><p>我们可以在这里下载对应版本的 Kafka，把压缩包进行解压，然后使用下面的命令启动单机版本的 Kafka。</p><p>解压：</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">&gt; tar -xzf kafka_2.11-2.1.0.tgz</span><br><span class="line">&gt; cd kafka_2.11-2.1.0</span><br></pre></td></tr></table></figure><p>启动 ZooKeeper 和 Kafka Server：</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">启动ZK：nohup bin&#x2F;zookeeper-server-start.sh config&#x2F;zookeeper.properties  &amp;</span><br><span class="line">启动Server: </span><br><span class="line">nohup bin&#x2F;kafka-server-start.sh config&#x2F;server.properties &amp;</span><br></pre></td></tr></table></figure><p>创建一个名为 test 的 Topic：</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">bin&#x2F;kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test</span><br></pre></td></tr></table></figure><h4><a href="#" class="headerlink"></a></h4><h4 id="Kafka-Producer"><a href="#Kafka-Producer" class="headerlink" title="Kafka Producer"></a>Kafka Producer</h4><p>首先我们需要新增一个依赖，然后向名为 test 的 Topic 中写入数据。新增 Maven 依赖：</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">&lt;dependency&gt;</span><br><span class="line">   &lt;groupId&gt;org.apache.flink&lt;&#x2F;groupId&gt;</span><br><span class="line">   &lt;artifactId&gt;flink-connector-kafka_2.11&lt;&#x2F;artifactId&gt;</span><br><span class="line">   &lt;version&gt;1.10.0&lt;&#x2F;version&gt;</span><br><span class="line">&lt;&#x2F;dependency&gt;</span><br></pre></td></tr></table></figure><p>向Topic中写入数据：</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br></pre></td><td class="code"><pre><span class="line">public class KafkaProducer &#123;</span><br><span class="line"></span><br><span class="line">    public static void main(String[] args) throws Exception&#123;</span><br><span class="line"></span><br><span class="line">        StreamExecutionEnvironment env &#x3D; StreamExecutionEnvironment.getExecutionEnvironment();</span><br><span class="line"></span><br><span class="line">        env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);</span><br><span class="line"></span><br><span class="line">        env.enableCheckpointing(5000);</span><br><span class="line"></span><br><span class="line">        DataStreamSource&lt;String&gt; text &#x3D; env.addSource(new MyNoParalleSource()).setParallelism(1);</span><br><span class="line"></span><br><span class="line">        Properties properties &#x3D; new Properties();</span><br><span class="line"></span><br><span class="line">        properties.setProperty(&quot;bootstrap.servers&quot;, &quot;127.0.0.1:9092&quot;);</span><br><span class="line"></span><br><span class="line">        &#x2F;&#x2F; 2.0 配置 KafkaProducer</span><br><span class="line"></span><br><span class="line">        FlinkKafkaProducer&lt;String&gt; producer &#x3D; new FlinkKafkaProducer&lt;String&gt;(</span><br><span class="line"></span><br><span class="line">                &quot;127.0.0.1:9092&quot;, &#x2F;&#x2F;broker 列表</span><br><span class="line"></span><br><span class="line">                &quot;test&quot;,           &#x2F;&#x2F;topic</span><br><span class="line"></span><br><span class="line">                new SimpleStringSchema()); &#x2F;&#x2F; 消息序列化</span><br><span class="line"></span><br><span class="line">        &#x2F;&#x2F;写入 Kafka 时附加记录的事件时间戳</span><br><span class="line"></span><br><span class="line">        producer.setWriteTimestampToKafka(true);</span><br><span class="line"></span><br><span class="line">        text.addSink(producer);</span><br><span class="line"></span><br><span class="line">        env.execute();</span><br><span class="line"></span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>需要注意的是，我们这里使用了一个自定义的 MyNoParalleSource 类，该类使用了 Flink 提供的自定义 Source 方法，该方法会源源不断地产生一些测试数据，代码如下：</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br></pre></td><td class="code"><pre><span class="line">public class MyNoParalleSource implements SourceFunction&lt;String&gt; &#123;</span><br><span class="line"></span><br><span class="line">    &#x2F;&#x2F;private long count &#x3D; 1L;</span><br><span class="line"></span><br><span class="line">    private boolean isRunning &#x3D; true;</span><br><span class="line"></span><br><span class="line">    &#x2F;**</span><br><span class="line"></span><br><span class="line">     * 主要的方法</span><br><span class="line"></span><br><span class="line">     * 启动一个source</span><br><span class="line"></span><br><span class="line">     * 大部分情况下，都需要在这个run方法中实现一个循环，这样就可以循环产生数据了</span><br><span class="line"></span><br><span class="line">     *</span><br><span class="line"></span><br><span class="line">     * @param ctx</span><br><span class="line"></span><br><span class="line">     * @throws Exception</span><br><span class="line"></span><br><span class="line">     *&#x2F;</span><br><span class="line"></span><br><span class="line">    @Override</span><br><span class="line"></span><br><span class="line">    public void run(SourceContext&lt;String&gt; ctx) throws Exception &#123;</span><br><span class="line"></span><br><span class="line">        while(isRunning)&#123;</span><br><span class="line"></span><br><span class="line">            &#x2F;&#x2F;图书的排行榜</span><br><span class="line"></span><br><span class="line">            List&lt;String&gt; books &#x3D; new ArrayList&lt;&gt;();</span><br><span class="line"></span><br><span class="line">            books.add(&quot;Pyhton从入门到放弃&quot;);&#x2F;&#x2F;10</span><br><span class="line"></span><br><span class="line">            books.add(&quot;Java从入门到放弃&quot;);&#x2F;&#x2F;8</span><br><span class="line"></span><br><span class="line">            books.add(&quot;Php从入门到放弃&quot;);&#x2F;&#x2F;5</span><br><span class="line"></span><br><span class="line">            books.add(&quot;C++从入门到放弃&quot;);&#x2F;&#x2F;3</span><br><span class="line"></span><br><span class="line">            books.add(&quot;Scala从入门到放弃&quot;);</span><br><span class="line"></span><br><span class="line">            int i &#x3D; new Random().nextInt(5);</span><br><span class="line"></span><br><span class="line">            ctx.collect(books.get(i));</span><br><span class="line"></span><br><span class="line">            &#x2F;&#x2F;每2秒产生一条数据</span><br><span class="line"></span><br><span class="line">            Thread.sleep(2000);</span><br><span class="line"></span><br><span class="line">        &#125;</span><br><span class="line"></span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    &#x2F;&#x2F;取消一个cancel的时候会调用的方法</span><br><span class="line"></span><br><span class="line">    @Override</span><br><span class="line"></span><br><span class="line">    public void cancel() &#123;</span><br><span class="line"></span><br><span class="line">        isRunning &#x3D; false;</span><br><span class="line"></span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="Flink-如何消费-Kafka"><a href="#Flink-如何消费-Kafka" class="headerlink" title="Flink 如何消费 Kafka"></a>Flink 如何消费 Kafka</h3><p>Flink 在和 Kafka 对接的过程中，跟 Kafka 的版本是强相关的。我们在使用 Kafka 连接器时需要引用相对应的 Jar 包依赖，对于某些连接器比如 Kafka 是有版本要求的，一定要去<a href="https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html" target="_blank" rel="external nofollow noopener noreferrer">官方网站</a>找到对应的依赖版本。<br>我们本地的 Kafka 版本是 2.1.0，所以需要对应的类是 FlinkKafkaConsumer。首先需要在 pom.xml 中引入 jar 包依赖：<br>复制</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">&lt;dependency&gt;</span><br><span class="line">  &lt;groupId&gt;org.apache.flink&lt;&#x2F;groupId&gt;</span><br><span class="line">  &lt;artifactId&gt;flink-connector-kafka_2.11&lt;&#x2F;artifactId&gt;</span><br><span class="line">  &lt;version&gt;1.10.0&lt;&#x2F;version&gt;</span><br><span class="line">&lt;&#x2F;dependency&gt;</span><br></pre></td></tr></table></figure><p>下面将对 Flink 消费 Kafka 数据的方式进行分类讲解。</p><h4 id="消费单个-Topic"><a href="#消费单个-Topic" class="headerlink" title="消费单个 Topic"></a>消费单个 Topic</h4><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">void</span> <span class="title">main</span><span class="params">(String[] args)</span> <span class="keyword">throws</span> Exception </span>&#123;</span><br><span class="line">    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();</span><br><span class="line">    env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);</span><br><span class="line">    env.enableCheckpointing(<span class="number">5000</span>);</span><br><span class="line">    Properties properties = <span class="keyword">new</span> Properties();</span><br><span class="line">    properties.setProperty(<span class="string">"bootstrap.servers"</span>, <span class="string">"127.0.0.1:9092"</span>);</span><br><span class="line">    <span class="comment">// 如果你是0.8版本的Kafka，需要配置</span></span><br><span class="line">    <span class="comment">//properties.setProperty("zookeeper.connect", "localhost:2181");</span></span><br><span class="line">    <span class="comment">//设置消费组</span></span><br><span class="line">    properties.setProperty(<span class="string">"group.id"</span>, <span class="string">"group_test"</span>);</span><br><span class="line">    FlinkKafkaConsumer&lt;String&gt; consumer = <span class="keyword">new</span> FlinkKafkaConsumer&lt;&gt;(<span class="string">"test"</span>, <span class="keyword">new</span> SimpleStringSchema(), properties);</span><br><span class="line">    <span class="comment">//设置从最早的ffset消费</span></span><br><span class="line">    consumer.setStartFromEarliest();</span><br><span class="line">    <span class="comment">//还可以手动指定相应的 topic, partition，offset,然后从指定好的位置开始消费</span></span><br><span class="line">    <span class="comment">//HashMap&lt;KafkaTopicPartition, Long&gt; map = new HashMap&lt;&gt;();</span></span><br><span class="line">    <span class="comment">//map.put(new KafkaTopicPartition("test", 1), 10240L);</span></span><br><span class="line">    <span class="comment">//假如partition有多个，可以指定每个partition的消费位置</span></span><br><span class="line">    <span class="comment">//map.put(new KafkaTopicPartition("test", 2), 10560L);</span></span><br><span class="line">    <span class="comment">//然后各个partition从指定位置消费</span></span><br><span class="line">    <span class="comment">//consumer.setStartFromSpecificOffsets(map);</span></span><br><span class="line">    env.addSource(consumer).flatMap(<span class="keyword">new</span> FlatMapFunction&lt;String, String&gt;() &#123;</span><br><span class="line">        <span class="meta">@Override</span></span><br><span class="line">        <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">flatMap</span><span class="params">(String value, Collector&lt;String&gt; out)</span> <span class="keyword">throws</span> Exception </span>&#123;</span><br><span class="line">            System.out.println(value);</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;);</span><br><span class="line">    env.execute(<span class="string">"start consumer..."</span>);</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>在设置消费 Kafka 中的数据时，可以显示地指定从某个 Topic 的每一个 Partition 中进行消费。</p><h4 id="消费多个-Topic"><a href="#消费多个-Topic" class="headerlink" title="消费多个 Topic"></a>消费多个 Topic</h4><p>我们的业务中会有这样的情况，同样的数据根据类型不同发送到了不同的 Topic 中，比如线上的订单数据根据来源不同分别发往移动端和 PC 端两个 Topic 中。但是我们不想把同样的代码复制一份，需重新指定一个 Topic 进行消费，这时候应该怎么办呢？</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line">Properties properties &#x3D; new Properties();</span><br><span class="line">properties.setProperty(&quot;bootstrap.servers&quot;, &quot;127.0.0.1:9092&quot;);</span><br><span class="line">&#x2F;&#x2F; 如果你是0.8版本的Kafka，需要配置</span><br><span class="line">&#x2F;&#x2F;properties.setProperty(&quot;zookeeper.connect&quot;, &quot;localhost:2181&quot;);</span><br><span class="line">&#x2F;&#x2F;设置消费组</span><br><span class="line">properties.setProperty(&quot;group.id&quot;, &quot;group_test&quot;);</span><br><span class="line">FlinkKafkaConsumer&lt;String&gt; consumer &#x3D; new FlinkKafkaConsumer&lt;&gt;(&quot;test&quot;, new SimpleStringSchema(), properties);</span><br><span class="line">ArrayList&lt;String&gt; topics &#x3D; new ArrayList&lt;&gt;();</span><br><span class="line">        topics.add(&quot;test_A&quot;);</span><br><span class="line">        topics.add(&quot;test_B&quot;);</span><br><span class="line">&#x2F;&#x2F; 传入一个 list，完美解决了这个问题</span><br><span class="line">FlinkKafkaConsumer&lt;Tuple2&lt;String, String&gt;&gt; consumer &#x3D; new FlinkKafkaConsumer&lt;&gt;(topics, new SimpleStringSchema(), properties);</span><br><span class="line">...</span><br></pre></td></tr></table></figure><p>我们可以传入一个 list 来解决消费多个 Topic 的问题，如果用户需要区分两个 Topic 中的数据，那么需要在发往 Kafka 中数据新增一个字段，用来区分来源。</p><h4 id="消息序列化"><a href="#消息序列化" class="headerlink" title="消息序列化"></a>消息序列化</h4><p>我们在上述消费 Kafka 消息时，都默认指定了消息的序列化方式，即 SimpleStringSchema。这里需要注意的是，在我们使用 SimpleStringSchema 的时候，返回的结果中只有原数据，没有 topic、parition 等信息，这时候可以自定义序列化的方式来实现自定义返回数据的结构。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line">public class CustomDeSerializationSchema implements KafkaDeserializationSchema&lt;ConsumerRecord&lt;String, String&gt;&gt; &#123;</span><br><span class="line">    &#x2F;&#x2F;是否表示流的最后一条元素,设置为false，表示数据会源源不断地到来</span><br><span class="line">    @Override</span><br><span class="line">    public boolean isEndOfStream(ConsumerRecord&lt;String, String&gt; nextElement) &#123;</span><br><span class="line">        return false;</span><br><span class="line">    &#125;</span><br><span class="line">    &#x2F;&#x2F;这里返回一个ConsumerRecord&lt;String,String&gt;类型的数据，除了原数据还包括topic，offset，partition等信息</span><br><span class="line">    @Override</span><br><span class="line">    public ConsumerRecord&lt;String, String&gt; deserialize(ConsumerRecord&lt;byte[], byte[]&gt; record) throws Exception &#123;</span><br><span class="line">        return new ConsumerRecord&lt;String, String&gt;(</span><br><span class="line">                record.topic(),</span><br><span class="line">                record.partition(),</span><br><span class="line">                record.offset(),</span><br><span class="line">                new String(record.key()),</span><br><span class="line">                new String(record.value())</span><br><span class="line">        );</span><br><span class="line">    &#125;</span><br><span class="line">    &#x2F;&#x2F;指定数据的输入类型</span><br><span class="line">    @Override</span><br><span class="line">    public TypeInformation&lt;ConsumerRecord&lt;String, String&gt;&gt; getProducedType() &#123;</span><br><span class="line">        return TypeInformation.of(new TypeHint&lt;ConsumerRecord&lt;String, String&gt;&gt;()&#123;&#125;);</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>这里自定义了 CustomDeSerializationSchema 信息，就可以直接使用了。</p><h4 id="-1"><a href="#-1" class="headerlink"></a></h4><h4 id="Parition-和-Topic-动态发现"><a href="#Parition-和-Topic-动态发现" class="headerlink" title="Parition 和 Topic 动态发现"></a>Parition 和 Topic 动态发现</h4><p>在很多场景下，随着业务的扩展，我们需要对 Kafka 的分区进行扩展，为了防止新增的分区没有被及时发现导致数据丢失，消费者必须要感知 Partition 的动态变化，可以使用 FlinkKafkaConsumer 的动态分区发现实现。</p><p>我们只需要指定下面的配置，即可打开动态分区发现功能：每隔 10ms 会动态获取 Topic 的元数据，对于新增的 Partition 会自动从最早的位点开始消费数据。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">properties.setProperty(FlinkKafkaConsumerBase.KEY_PARTITION_DISCOVERY_INTERVAL_MILLIS, &quot;10&quot;);</span><br></pre></td></tr></table></figure><p>如果业务场景需要我们动态地发现 Topic，可以指定 Topic 的正则表达式：</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">FlinkKafkaConsumer&lt;String&gt; consumer &#x3D; new FlinkKafkaConsumer&lt;&gt;(Pattern.compile(&quot;^test_([A-Za-z0-9]*)$&quot;), new SimpleStringSchema(), properties);</span><br></pre></td></tr></table></figure><h4 id="-2"><a href="#-2" class="headerlink"></a></h4><h4 id="Flink-消费-Kafka-设置-offset-的方法"><a href="#Flink-消费-Kafka-设置-offset-的方法" class="headerlink" title="Flink 消费 Kafka 设置 offset 的方法"></a>Flink 消费 Kafka 设置 offset 的方法</h4><p>Flink 消费 Kafka 需要指定消费的 offset，也就是<strong>偏移量</strong>。Flink 读取 Kafka 的消息有五种消费方式：</p><ul><li><p>指定 Topic 和 Partition</p></li><li><p>从最早位点开始消费</p></li><li><p>从指定时间点开始消费</p></li><li><p>从最新的数据开始消费</p></li><li><p>从上次消费位点开始消费</p></li></ul><p>复制</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line">&#x2F;**</span><br><span class="line">* Flink从指定的topic和parition中指定的offset开始</span><br><span class="line">*&#x2F;</span><br><span class="line">Map&lt;KafkaTopicPartition, Long&gt; offsets &#x3D; new HashedMap();</span><br><span class="line">offsets.put(new KafkaTopicPartition(&quot;test&quot;, 0), 10000L);</span><br><span class="line">offsets.put(new KafkaTopicPartition(&quot;test&quot;, 1), 20000L);</span><br><span class="line">offsets.put(new KafkaTopicPartition(&quot;test&quot;, 2), 30000L);</span><br><span class="line">consumer.setStartFromSpecificOffsets(offsets);</span><br><span class="line">&#x2F;**</span><br><span class="line">* Flink从topic中最早的offset消费</span><br><span class="line">*&#x2F;</span><br><span class="line">consumer.setStartFromEarliest();</span><br><span class="line">&#x2F;**</span><br><span class="line">* Flink从topic中指定的时间点开始消费</span><br><span class="line">*&#x2F;</span><br><span class="line">consumer.setStartFromTimestamp(1559801580000l);</span><br><span class="line">&#x2F;**</span><br><span class="line">* Flink从topic中最新的数据开始消费</span><br><span class="line">*&#x2F;</span><br><span class="line">consumer.setStartFromLatest();</span><br><span class="line">&#x2F;**</span><br><span class="line">* Flink从topic中指定的group上次消费的位置开始消费，所以必须配置group.id参数</span><br><span class="line">*&#x2F;</span><br><span class="line">consumer.setStartFromGroupOffsets();</span><br></pre></td></tr></table></figure><h4 id="Offset提交"><a href="#Offset提交" class="headerlink" title="Offset提交"></a>Offset提交</h4><p>Flink Kafka Consumer允许配置offset提交回Kafka brokers(Kafka 0.8是写回Zookeeper)的行为，注意Flink Kafka Consumer 并不依赖于这个提交的offset来进行容错性保证，这个提交的offset仅仅作为监控consumer处理进度的一种手段。</p><p>配置offset提交行为的方式有多种，主要取决于Job的checkpoint机制是否启动。</p><p>1、<strong>checkpoint禁用</strong>:如果checkpoint禁用，Flink Kafka Consumer依赖于Kafka 客户端内部的自动周期性offset提交能力。因此，为了启用或者禁用offset提交，仅需在给定的Properties配置中设置enable.auto.commit / auto.commit.interval.ms，就会按固定的时间间隔定期 auto commit offset 到 kafka。</p><p>2、<strong>checkpoint启用</strong>:如果checkpoint启用，当checkpoint完成之后，Flink Kafka Consumer将会提交offset保存到checkpoint State中，这个时候作业消费的 offset 是 Flink 在 state 中自己管理和容错，保证了kafka broker中的committed offset与 checkpoint stata中的offset相一致。用户可以在Consumer中调用<code>setCommitOffsetsOnCheckpoints(boolean)</code> 方法来选择启用或者禁用offset committing，默认的情况下是setCommitOffsetsOnCheckpoints(true)，checkpoint成功后，将offset同步给kafka。注意，在这种情况下，配置在Properties中的自动周期性offset提交将会被完全忽略。</p><h3 id="源码解析"><a href="#源码解析" class="headerlink" title="源码解析"></a>源码解析</h3><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1602237870349-32c86f4f-8115-4ce1-a4d6-8b197b0e513d.png#align=left&display=inline&height=1000&margin=%5Bobject%20Object%5D&originHeight=1000&originWidth=1928&size=0&status=done&style=none&width=1928" alt><br>从上面的类图可以看出，FlinkKafkaConsumer 继承了 FlinkKafkaConsumerBase，而 FlinkKafkaConsumerBase 最终是对 SourceFunction 进行了实现。</p><p>整体的流程：FlinkKafkaConsumer 首先创建了 KafkaFetcher 对象，然后 KafkaFetcher 创建了 KafkaConsumerThread 和 Handover，KafkaConsumerThread 负责直接从 Kafka 中读取 msg，并交给 Handover，然后 Handover 将 msg 传递给 KafkaFetcher.emitRecord 将消息发出。</p><p>因为 FlinkKafkaConsumerBase 实现了 RichFunction 接口，所以当程序启动的时候，会首先调用 FlinkKafkaConsumerBase.open 方法：</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br><span class="line">128</span><br><span class="line">129</span><br><span class="line">130</span><br><span class="line">131</span><br><span class="line">132</span><br><span class="line">133</span><br><span class="line">134</span><br><span class="line">135</span><br><span class="line">136</span><br></pre></td><td class="code"><pre><span class="line">public void open(Configuration configuration) throws Exception &#123;</span><br><span class="line">   &#x2F;&#x2F; 指定offset的提交方式</span><br><span class="line">   this.offsetCommitMode &#x3D; OffsetCommitModes.fromConfiguration(</span><br><span class="line">         getIsAutoCommitEnabled(),</span><br><span class="line">         enableCommitOnCheckpoints,</span><br><span class="line">         ((StreamingRuntimeContext) getRuntimeContext()).isCheckpointingEnabled());</span><br><span class="line">   &#x2F;&#x2F; 创建分区发现器</span><br><span class="line">   this.partitionDiscoverer &#x3D; createPartitionDiscoverer(</span><br><span class="line">         topicsDescriptor,</span><br><span class="line">         getRuntimeContext().getIndexOfThisSubtask(),</span><br><span class="line">         getRuntimeContext().getNumberOfParallelSubtasks());</span><br><span class="line">   this.partitionDiscoverer.open();</span><br><span class="line">   subscribedPartitionsToStartOffsets &#x3D; new HashMap&lt;&gt;();</span><br><span class="line">   final List&lt;KafkaTopicPartition&gt; allPartitions &#x3D; partitionDiscoverer.discoverPartitions();</span><br><span class="line">   if (restoredState !&#x3D; null) &#123;</span><br><span class="line">      for (KafkaTopicPartition partition : allPartitions) &#123;</span><br><span class="line">         if (!restoredState.containsKey(partition)) &#123;</span><br><span class="line">            restoredState.put(partition, KafkaTopicPartitionStateSentinel.EARLIEST_OFFSET);</span><br><span class="line">         &#125;</span><br><span class="line">      &#125;</span><br><span class="line">      for (Map.Entry&lt;KafkaTopicPartition, Long&gt; restoredStateEntry : restoredState.entrySet()) &#123;</span><br><span class="line">         if (!restoredFromOldState) &#123;</span><br><span class="line">           </span><br><span class="line">            if (KafkaTopicPartitionAssigner.assign(</span><br><span class="line">               restoredStateEntry.getKey(), getRuntimeContext().getNumberOfParallelSubtasks())</span><br><span class="line">                  &#x3D;&#x3D; getRuntimeContext().getIndexOfThisSubtask())&#123;</span><br><span class="line">               subscribedPartitionsToStartOffsets.put(restoredStateEntry.getKey(), restoredStateEntry.getValue());</span><br><span class="line">            &#125;</span><br><span class="line">         &#125; else &#123;</span><br><span class="line">           subscribedPartitionsToStartOffsets.put(restoredStateEntry.getKey(), restoredStateEntry.getValue());</span><br><span class="line">         &#125;</span><br><span class="line">      &#125;</span><br><span class="line">      if (filterRestoredPartitionsWithCurrentTopicsDescriptor) &#123;</span><br><span class="line">         subscribedPartitionsToStartOffsets.entrySet().removeIf(entry -&gt; &#123;</span><br><span class="line">            if (!topicsDescriptor.isMatchingTopic(entry.getKey().getTopic())) &#123;</span><br><span class="line">               LOG.warn(</span><br><span class="line">                  &quot;&#123;&#125; is removed from subscribed partitions since it is no longer associated with topics descriptor of current execution.&quot;,</span><br><span class="line">                  entry.getKey());</span><br><span class="line">               return true;</span><br><span class="line">            &#125;</span><br><span class="line">            return false;</span><br><span class="line">         &#125;);</span><br><span class="line">      &#125;</span><br><span class="line">      LOG.info(&quot;Consumer subtask &#123;&#125; will start reading &#123;&#125; partitions with offsets in restored state: &#123;&#125;&quot;,</span><br><span class="line">         getRuntimeContext().getIndexOfThisSubtask(), subscribedPartitionsToStartOffsets.size(), subscribedPartitionsToStartOffsets);</span><br><span class="line">   &#125; else &#123;</span><br><span class="line">    </span><br><span class="line">      switch (startupMode) &#123;</span><br><span class="line">         case SPECIFIC_OFFSETS:</span><br><span class="line">            if (specificStartupOffsets &#x3D;&#x3D; null) &#123;</span><br><span class="line">               throw new IllegalStateException(</span><br><span class="line">                  &quot;Startup mode for the consumer set to &quot; + StartupMode.SPECIFIC_OFFSETS +</span><br><span class="line">                     &quot;, but no specific offsets were specified.&quot;);</span><br><span class="line">            &#125;</span><br><span class="line">            for (KafkaTopicPartition seedPartition : allPartitions) &#123;</span><br><span class="line">               Long specificOffset &#x3D; specificStartupOffsets.get(seedPartition);</span><br><span class="line">               if (specificOffset !&#x3D; null) &#123;</span><br><span class="line">                                 subscribedPartitionsToStartOffsets.put(seedPartition, specificOffset - 1);</span><br><span class="line">               &#125; else &#123;</span><br><span class="line">               subscribedPartitionsToStartOffsets.put(seedPartition, KafkaTopicPartitionStateSentinel.GROUP_OFFSET);</span><br><span class="line">               &#125;</span><br><span class="line">            &#125;</span><br><span class="line">            break;</span><br><span class="line">         case TIMESTAMP:</span><br><span class="line">            if (startupOffsetsTimestamp &#x3D;&#x3D; null) &#123;</span><br><span class="line">               throw new IllegalStateException(</span><br><span class="line">                  &quot;Startup mode for the consumer set to &quot; + StartupMode.TIMESTAMP +</span><br><span class="line">                     &quot;, but no startup timestamp was specified.&quot;);</span><br><span class="line">            &#125;</span><br><span class="line">            for (Map.Entry&lt;KafkaTopicPartition, Long&gt; partitionToOffset</span><br><span class="line">                  : fetchOffsetsWithTimestamp(allPartitions, startupOffsetsTimestamp).entrySet()) &#123;</span><br><span class="line">               subscribedPartitionsToStartOffsets.put(</span><br><span class="line">                  partitionToOffset.getKey(),</span><br><span class="line">                  (partitionToOffset.getValue() &#x3D;&#x3D; null)</span><br><span class="line">                      KafkaTopicPartitionStateSentinel.LATEST_OFFSET</span><br><span class="line">                        : partitionToOffset.getValue() - 1);</span><br><span class="line">            &#125;</span><br><span class="line">            break;</span><br><span class="line">         default:</span><br><span class="line">            for (KafkaTopicPartition seedPartition : allPartitions) &#123;</span><br><span class="line">               subscribedPartitionsToStartOffsets.put(seedPartition, startupMode.getStateSentinel());</span><br><span class="line">            &#125;</span><br><span class="line">      &#125;</span><br><span class="line">      if (!subscribedPartitionsToStartOffsets.isEmpty()) &#123;</span><br><span class="line">         switch (startupMode) &#123;</span><br><span class="line">            case EARLIEST:</span><br><span class="line">               LOG.info(&quot;Consumer subtask &#123;&#125; will start reading the following &#123;&#125; partitions from the earliest offsets: &#123;&#125;&quot;,</span><br><span class="line">                  getRuntimeContext().getIndexOfThisSubtask(),</span><br><span class="line">                  subscribedPartitionsToStartOffsets.size(),</span><br><span class="line">                  subscribedPartitionsToStartOffsets.keySet());</span><br><span class="line">               break;</span><br><span class="line">            case LATEST:</span><br><span class="line">               LOG.info(&quot;Consumer subtask &#123;&#125; will start reading the following &#123;&#125; partitions from the latest offsets: &#123;&#125;&quot;,</span><br><span class="line">                  getRuntimeContext().getIndexOfThisSubtask(),</span><br><span class="line">                  subscribedPartitionsToStartOffsets.size(),</span><br><span class="line">                  subscribedPartitionsToStartOffsets.keySet());</span><br><span class="line">               break;</span><br><span class="line">            case TIMESTAMP:</span><br><span class="line">               LOG.info(&quot;Consumer subtask &#123;&#125; will start reading the following &#123;&#125; partitions from timestamp &#123;&#125;: &#123;&#125;&quot;,</span><br><span class="line">                  getRuntimeContext().getIndexOfThisSubtask(),</span><br><span class="line">                  subscribedPartitionsToStartOffsets.size(),</span><br><span class="line">                  startupOffsetsTimestamp,</span><br><span class="line">                  subscribedPartitionsToStartOffsets.keySet());</span><br><span class="line">               break;</span><br><span class="line">            case SPECIFIC_OFFSETS:</span><br><span class="line">               LOG.info(&quot;Consumer subtask &#123;&#125; will start reading the following &#123;&#125; partitions from the specified startup offsets &#123;&#125;: &#123;&#125;&quot;,</span><br><span class="line">                  getRuntimeContext().getIndexOfThisSubtask(),</span><br><span class="line">                  subscribedPartitionsToStartOffsets.size(),</span><br><span class="line">                  specificStartupOffsets,</span><br><span class="line">                  subscribedPartitionsToStartOffsets.keySet());</span><br><span class="line">               List&lt;KafkaTopicPartition&gt; partitionsDefaultedToGroupOffsets &#x3D; new ArrayList&lt;&gt;(subscribedPartitionsToStartOffsets.size());</span><br><span class="line">               for (Map.Entry&lt;KafkaTopicPartition, Long&gt; subscribedPartition : subscribedPartitionsToStartOffsets.entrySet()) &#123;</span><br><span class="line">                  if (subscribedPartition.getValue() &#x3D;&#x3D; KafkaTopicPartitionStateSentinel.GROUP_OFFSET) &#123;</span><br><span class="line">                     partitionsDefaultedToGroupOffsets.add(subscribedPartition.getKey());</span><br><span class="line">                  &#125;</span><br><span class="line">               &#125;</span><br><span class="line">               if (partitionsDefaultedToGroupOffsets.size() &gt; 0) &#123;</span><br><span class="line">                  LOG.warn(&quot;Consumer subtask &#123;&#125; cannot find offsets for the following &#123;&#125; partitions in the specified startup offsets: &#123;&#125;&quot; +</span><br><span class="line">                        &quot;; their startup offsets will be defaulted to their committed group offsets in Kafka.&quot;,</span><br><span class="line">                     getRuntimeContext().getIndexOfThisSubtask(),</span><br><span class="line">                     partitionsDefaultedToGroupOffsets.size(),</span><br><span class="line">                     partitionsDefaultedToGroupOffsets);</span><br><span class="line">               &#125;</span><br><span class="line">               break;</span><br><span class="line">            case GROUP_OFFSETS:</span><br><span class="line">               LOG.info(&quot;Consumer subtask &#123;&#125; will start reading the following &#123;&#125; partitions from the committed group offsets in Kafka: &#123;&#125;&quot;,</span><br><span class="line">                  getRuntimeContext().getIndexOfThisSubtask(),</span><br><span class="line">                  subscribedPartitionsToStartOffsets.size(),</span><br><span class="line">                  subscribedPartitionsToStartOffsets.keySet());</span><br><span class="line">         &#125;</span><br><span class="line">      &#125; else &#123;</span><br><span class="line">         LOG.info(&quot;Consumer subtask &#123;&#125; initially has no partitions to read from.&quot;,</span><br><span class="line">            getRuntimeContext().getIndexOfThisSubtask());</span><br><span class="line">      &#125;</span><br><span class="line">   &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>对 Kafka 中的 Topic 和 Partition 的数据进行读取的核心逻辑都在 run 方法中：</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br></pre></td><td class="code"><pre><span class="line">public void run(SourceContext&lt;T&gt; sourceContext) throws Exception &#123;</span><br><span class="line">   if (subscribedPartitionsToStartOffsets &#x3D;&#x3D; null) &#123;</span><br><span class="line">      throw new Exception(&quot;The partitions were not set for the consumer&quot;);</span><br><span class="line">   &#125;</span><br><span class="line">   this.successfulCommits &#x3D; this.getRuntimeContext().getMetricGroup().counter(COMMITS_SUCCEEDED_METRICS_COUNTER);</span><br><span class="line">   this.failedCommits &#x3D;  this.getRuntimeContext().getMetricGroup().counter(COMMITS_FAILED_METRICS_COUNTER);</span><br><span class="line">   final int subtaskIndex &#x3D; this.getRuntimeContext().getIndexOfThisSubtask();</span><br><span class="line">   this.offsetCommitCallback &#x3D; new KafkaCommitCallback() &#123;</span><br><span class="line">      @Override</span><br><span class="line">      public void onSuccess() &#123;</span><br><span class="line">         successfulCommits.inc();</span><br><span class="line">      &#125;</span><br><span class="line">      @Override</span><br><span class="line">      public void onException(Throwable cause) &#123;</span><br><span class="line">         LOG.warn(String.format(&quot;Consumer subtask %d failed async Kafka commit.&quot;, subtaskIndex), cause);</span><br><span class="line">         failedCommits.inc();</span><br><span class="line">      &#125;</span><br><span class="line">   &#125;;</span><br><span class="line">   if (subscribedPartitionsToStartOffsets.isEmpty()) &#123;</span><br><span class="line">      sourceContext.markAsTemporarilyIdle();</span><br><span class="line">   &#125;</span><br><span class="line">   LOG.info(&quot;Consumer subtask &#123;&#125; creating fetcher with offsets &#123;&#125;.&quot;,</span><br><span class="line">      getRuntimeContext().getIndexOfThisSubtask(), subscribedPartitionsToStartOffsets);</span><br><span class="line">  </span><br><span class="line">   this.kafkaFetcher &#x3D; createFetcher(</span><br><span class="line">         sourceContext,</span><br><span class="line">         subscribedPartitionsToStartOffsets,</span><br><span class="line">         periodicWatermarkAssigner,</span><br><span class="line">         punctuatedWatermarkAssigner,</span><br><span class="line">         (StreamingRuntimeContext) getRuntimeContext(),</span><br><span class="line">         offsetCommitMode,</span><br><span class="line">         getRuntimeContext().getMetricGroup().addGroup(KAFKA_CONSUMER_METRICS_GROUP),</span><br><span class="line">         useMetrics);</span><br><span class="line">   if (!running) &#123;</span><br><span class="line">      return;</span><br><span class="line">   &#125;</span><br><span class="line">   if (discoveryIntervalMillis &#x3D;&#x3D; PARTITION_DISCOVERY_DISABLED) &#123;</span><br><span class="line">      kafkaFetcher.runFetchLoop();</span><br><span class="line">   &#125; else &#123;</span><br><span class="line">      runWithPartitionDiscovery();</span><br><span class="line">   &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="-3"><a href="#-3" class="headerlink"></a></h3><h3 id="Flink-消费-Kafka-数据代码"><a href="#Flink-消费-Kafka-数据代码" class="headerlink" title="Flink 消费 Kafka 数据代码"></a>Flink 消费 Kafka 数据代码</h3><p>上面介绍了 Flink 消费 Kafka 的方式，以及消息序列化的方式，同时介绍了分区和 Topic 的动态发现方法，那么回到我们的项目中来，消费 Kafka 数据的完整代码如下：</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line">public class KafkaConsumer &#123;</span><br><span class="line">    public static void main(String[] args) throws Exception &#123;</span><br><span class="line">        StreamExecutionEnvironment env &#x3D; StreamExecutionEnvironment.getExecutionEnvironment();</span><br><span class="line">        env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);</span><br><span class="line">        env.enableCheckpointing(5000);</span><br><span class="line">        Properties properties &#x3D; new Properties();</span><br><span class="line">        properties.setProperty(&quot;bootstrap.servers&quot;, &quot;127.0.0.1:9092&quot;);</span><br><span class="line">        &#x2F;&#x2F;设置消费组</span><br><span class="line">        properties.setProperty(&quot;group.id&quot;, &quot;group_test&quot;);</span><br><span class="line">        properties.setProperty(FlinkKafkaConsumerBase.KEY_PARTITION_DISCOVERY_INTERVAL_MILLIS, &quot;10&quot;);</span><br><span class="line">        FlinkKafkaConsumer&lt;String&gt; consumer &#x3D; new FlinkKafkaConsumer&lt;&gt;(&quot;test&quot;, new SimpleStringSchema(), properties);</span><br><span class="line">        &#x2F;&#x2F;设置从最早的ffset消费</span><br><span class="line">        consumer.setStartFromEarliest();</span><br><span class="line">        env.addSource(consumer).flatMap(new FlatMapFunction&lt;String, String&gt;() &#123;</span><br><span class="line">            @Override</span><br><span class="line">            public void flatMap(String value, Collector&lt;String&gt; out) throws Exception &#123;</span><br><span class="line">                System.out.println(value);</span><br><span class="line">            &#125;</span><br><span class="line">        &#125;);</span><br><span class="line">        env.execute(&quot;start consumer...&quot;);</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>我们可以直接右键运行代码，在控制台中可以看到数据的正常打印，如下图所示：<br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1602237870409-f14c04fa-493b-4c04-ad56-39a6b896b8f7.png#align=left&display=inline&height=1022&margin=%5Bobject%20Object%5D&originHeight=1022&originWidth=2798&size=0&status=done&style=none&width=2798" alt><br>通过代码可知，我们之前发往 Kafka 的消息被完整地打印出来了。</p><h3 id="Q-amp-A"><a href="#Q-amp-A" class="headerlink" title="Q&amp;A"></a>Q&amp;A</h3><p>如果checkpoint时间过长，offset未提交到kafka，此时节点宕机了，重启之后的重复消费如何保证呢？</p><p><strong>首先开启checkpoint时offset是flink通过状态state管理和恢复的</strong>，并不是从kafka的offset位置恢复。在checkpoint机制下，作业从最近一次checkpoint恢复，本身是会回放部分历史数据，导致部分数据重复消费，Flink引擎仅保证计算状态的精准一次，<strong>要想做到端到端精准一次需要依赖一些幂等的存储系统或者事务操作。</strong></p><!-- rebuild by neat -->]]></content>
    
    <summary type="html">
    
      
      
        &lt;!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --&gt;&lt;p&gt;在实时计算的场景下，绝大多数的数据源都是消息系统，而 Kafka 从众多的消息中间件中脱颖而出，主要是因为&lt;strong&gt;高吞吐&lt;/strong
      
    
    </summary>
    
    
      <category term="大数据" scheme="cpeixin.cn/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/"/>
    
    
      <category term="Flink" scheme="cpeixin.cn/tags/Flink/"/>
    
  </entry>
  
  <entry>
    <title>关于Hive中的NULL</title>
    <link href="cpeixin.cn/2020/11/19/%E5%85%B3%E4%BA%8EHive%E4%B8%AD%E7%9A%84NULL/"/>
    <id>cpeixin.cn/2020/11/19/%E5%85%B3%E4%BA%8EHive%E4%B8%AD%E7%9A%84NULL/</id>
    <published>2020-11-18T16:35:14.000Z</published>
    <updated>2020-11-18T16:40:22.796Z</updated>
    
    <content type="html"><![CDATA[<!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --><p>最近公司的小伙伴基本上都在用Hive来查询数据，毕竟对于后端程序员来讲更为熟悉的还是像MySQL，Oracle这样的数据库，虽然Hive中的SQL标准和上面两者很相似了，但是还有些差别的，这也造成了同事们的水土不服，其中就有关于Hive判空的操作，那这里就记录一下Hive的Null</p><p>hive的使用中对null、‘’（空字符串）进行判断识别</p><ol><li>不同数据类型对空值的存储规则<ul><li>int与string类型数据存储，null默认存储为 \N；</li><li>string类型的数据如果为””，存储则是””；</li><li>另外往int类型的字段插入数据“”时，结果还是\N。</li></ul></li></ol><ol start="2"><li>不同数据类型，空值的查询<ul><li>对于int可以使用is null来判断空；</li><li>而对于string类型，条件is null 查出来的是\N的数据；而条件 =’’，查询出来的是””的数据。</li></ul></li></ol><p>下面举例来说明：<br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1605716033203-db3a4544-2117-4ca7-b6a1-c68c20f54841.png#align=left&display=inline&height=244&margin=%5Bobject%20Object%5D&name=%E6%88%AA%E5%B1%8F2020-11-19%20%E4%B8%8A%E5%8D%8812.13.16.png&originHeight=244&originWidth=3340&size=291515&status=done&style=none&width=3340" alt="截屏2020-11-19 上午12.13.16.png"><br>上图 t_user_details，brent用户的register_date字段为NULL值，那这时候，我们应该怎样去根据register_date字段值去查询brent这个用户呢？</p><p>直接看结果：<br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1605716239115-1f583488-569b-4bc1-a03b-e57cbcb2d67d.png#align=left&display=inline&height=352&margin=%5Bobject%20Object%5D&name=%E6%88%AA%E5%B1%8F2020-11-19%20%E4%B8%8A%E5%8D%8812.17.02.png&originHeight=352&originWidth=1656&size=199561&status=done&style=none&width=1656" alt="截屏2020-11-19 上午12.17.02.png"><br>对于上面的查询，register_date = ‘’ 是不能成功过滤的。根据上面所说的，也就是 <strong>对于string类型，条件is null 查出来的是\N的数据；而条件 =’’，查询出来的是””的数据</strong></p><p>所以对于 cloumn = ‘’ 的查询，是针对这个cloumn字段，写入值的时候，是 ‘’ 空子串，所以此时，<strong>is null 的条件则不生效</strong>。</p><p>现在我们去HDFS底层存储中去求证一下<br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1605716536922-7a8f58e0-e062-4682-8e4b-5347115bc4b0.png#align=left&display=inline&height=1070&margin=%5Bobject%20Object%5D&name=%E6%88%AA%E5%B1%8F2020-11-19%20%E4%B8%8A%E5%8D%8812.21.26.png&originHeight=1070&originWidth=2470&size=216783&status=done&style=none&width=2470" alt="截屏2020-11-19 上午12.21.26.png"></p><p>有图有真相，hive 中null实际在HDFS中默认存储为 \N</p><p>可以使用serialization.null.format来指定Hive中保存和标识NULL，可以设置为默认的\N，也可以为NULL或’’</p><blockquote><p>eg : ALTER TABLE b SET SERDEPROPERTIES (‘serialization.null.format’=’’);</p></blockquote><p>如果表中存在大量的NULL值，则在Hive的数据文件中会产生大量的\N数据，浪费存储空间，那我们可以将serialization.null.format设置为’’</p><p>那么对于上游系统写入的数据不清楚的情况下，我们怎么去方便的对空值进行判断呢？</p><p>给一个万金油的写法：</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">select * from t_user_details  where register_date is null or register_date  &lt;&gt; &#39;&#39;;</span><br></pre></td></tr></table></figure><!-- rebuild by neat -->]]></content>
    
    <summary type="html">
    
      
      
        &lt;!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --&gt;&lt;p&gt;最近公司的小伙伴基本上都在用Hive来查询数据，毕竟对于后端程序员来讲更为熟悉的还是像MySQL，Oracle这样的数据库，虽然Hive中的SQ
      
    
    </summary>
    
    
      <category term="大数据" scheme="cpeixin.cn/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/"/>
    
    
      <category term="hive" scheme="cpeixin.cn/tags/hive/"/>
    
  </entry>
  
  <entry>
    <title>DMP系统设计的选型考量</title>
    <link href="cpeixin.cn/2020/10/28/DMP%E7%B3%BB%E7%BB%9F%E8%AE%BE%E8%AE%A1%E7%9A%84%E9%80%89%E5%9E%8B%E8%80%83%E9%87%8F/"/>
    <id>cpeixin.cn/2020/10/28/DMP%E7%B3%BB%E7%BB%9F%E8%AE%BE%E8%AE%A1%E7%9A%84%E9%80%89%E5%9E%8B%E8%80%83%E9%87%8F/</id>
    <published>2020-10-27T16:14:18.000Z</published>
    <updated>2020-10-27T16:18:20.600Z</updated>
    
    <content type="html"><![CDATA[<!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --><p><a name="yh9UI"></a></p><h3 id="DMP：数据管理平台"><a href="#DMP：数据管理平台" class="headerlink" title="DMP：数据管理平台"></a>DMP：数据管理平台<br></h3><p>我们先来看一下什么是 DMP 系统。DMP 系统的全称叫作数据管理平台（Data Management Platform），目前广泛应用在互联网的广告定向（Ad Targeting）、个性化推荐（Recommendation）这些领域。</p><p>通常来说，DMP 系统会通过处理海量的互联网访问数据以及机器学习算法，给一个用户标注上各种各样的标签。然后，在我们做个性化推荐和广告投放的时候，再利用这些这些标签，去做实际的广告排序、推荐等工作。无论是 Google 的搜索广告、淘宝里千人千面的商品信息，还是抖音里面的信息流推荐，背后都会有一个 DMP 系统。</p><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1603813744888-8241c230-5870-42fd-8122-f83b89894b21.png#align=left&display=inline&height=1043&margin=%5Bobject%20Object%5D&name=image.png&originHeight=2086&originWidth=2122&size=517302&status=done&style=none&width=1061" alt="image.png"><br><br><br>那么，一个 DMP 系统应该怎么搭建呢？对于外部使用 DMP 的系统或者用户来说，可以简单地把 DMP 看成是一个键 - 值对（Key-Value）数据库。我们的广告系统或者推荐系统，可以通过一个客户端输入用户的唯一标识（ID），然后拿到这个用户的各种信息。<br><br><br>这些信息中，有些是用户的人口属性信息（Demographic），比如性别、年龄；有些是非常具体的行为（Behavior），比如用户最近看过的商品是什么，用户的手机型号是什么；有一些是我们通过算法系统计算出来的兴趣（Interests），比如用户喜欢健身、听音乐；还有一些则是完全通过机器学习算法得出的用户向量，给后面的推荐算法或者广告算法作为数据输入。<br><br><br>基于此，对于这个 KV 数据库，我们的期望也很清楚，那就是：低响应时间（Low Response Time）、高可用性（High Availability）、高并发（High Concurrency）、海量数据（Big Data），<strong>同时我们需要付得起对应的成本（Affordable Cost）</strong>。如果用数字来衡量这些指标，那么我们的期望就会具体化成下面这样。<br></p><ol><li>低响应时间：一般的广告系统留给整个广告投放决策的时间也就是 10ms 左右，所以对于访问 DMP 获取用户数据，预期的响应时间都在 1ms 之内。</li><li>高可用性：DMP 常常用在广告系统里面。DMP 系统出问题，往往就意味着我们整个的广告收入在不可用的时间就没了，所以我们对于可用性的追求可谓是没有上限的。Google 2018 年的广告收入是 1160 亿美元，折合到每一分钟的收入是 22 万美元。即使我们做到 99.99% 的可用性，也意味着每个月我们都会损失 100 万美元。</li><li>高并发：还是以广告系统为例，如果每天我们需要响应 100 亿次的广告请求，那么我们每秒的并发请求数就在 100 亿 / (86400) ~= 12K 次左右，所以我们的 DMP 需要支持高并发。</li><li>数据量：如果我们的产品针对中国市场，那么我们需要有 10 亿个 Key，对应的假设每个用户有 500 个标签，标签有对应的分数。标签和分数都用一个 4 字节（Bytes）的整数来表示，那么一共我们需要 10 亿 x 500 x (4 + 4) Bytes = 4 TB 的数据了。</li><li>低成本：我们还是从广告系统的角度来考虑。广告系统的收入通常用 CPM（Cost Per Mille），也就是千次曝光来统计。如果千次曝光的利润是 0.10，那么每天100亿次的曝光就是100万美元的利润。这个利润听起来非常高了。但是反过来算一下，你会发现，DMP每1000次的请求的成本不能超过0.10。最好只有 $0.01，甚至更低，我们才能尽可能多赚到一点广告利润。</li></ol><p><br>这五个因素一结合，听起来是不是就不那么简单了？不过，更复杂的还在后面呢。虽然从外部看起来，DMP 特别简单，就是一个 KV 数据库，但是生成这个数据库需要做的事情更多。我们下面一起来看一看。<br><br><br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1603814024445-565124fd-d79e-42de-98df-4dbf9a462042.png#align=left&display=inline&height=1564&margin=%5Bobject%20Object%5D&name=image.png&originHeight=3127&originWidth=2920&size=1141158&status=done&style=none&width=1460" alt="image.png"><br><br><br>为了能够生成这个 KV 数据库，我们需要有一个在客户端或者 Web 端的数据采集模块，不断采集用户的行为，向后端的服务器发送数据。服务器端接收到数据，就要把这份数据放到一个数据管道（Data Pipeline）里面。数据管道的下游，需要实际将数据落地到数据仓库（Data Warehouse），把所有的这些数据结构化地存储起来。后续，我们就可以通过程序去分析这部分日志，生成报表或者或者利用数据运行各种机器学习算法。<br><br><br>除了这个数据仓库之外，我们还会有一个实时数据处理模块（Realtime Data Processing），也放在数据管道的下游。它同样会读取数据管道里面的数据，去进行各种实时计算，然后把需要的结果写入到 DMP 的 KV 数据库里面去。<br></p><p><a name="x6WvP"></a></p><h3 id="MongoDB-真的万能吗？"><a href="#MongoDB-真的万能吗？" class="headerlink" title="MongoDB 真的万能吗？"></a>MongoDB 真的万能吗？</h3><p>面对这里的 KV 数据库、数据管道以及数据仓库，这三个不同的数据存储的需求，最合理的技术方案是什么呢？你可以先自己思考一下，我这里先卖个关子。我共事过的不少不错的 Web 程序员，面对这个问题的时候，常常会说：“这有什么难的，用 MongoDB 就好了呀！”如果你也选择了 MongoDB，那最终的结果一定是一场灾难。<br><br><br>我为什么这么说呢？MongoDB 的设计听起来特别厉害，不需要预先数据 Schema，访问速度很快，还能够无限水平扩展。作为 KV 数据库，我们可以把 MongoDB 当作 DMP 里面的 KV 数据库；除此之外，MongoDB 还能水平扩展、跑 MQL，我们可以把它当作数据仓库来用。<br><br><br>至于数据管道，只要我们能够不断往 MongoDB 里面，插入新的数据就好了。从运维的角度来说，我们只需要维护一种数据库，技术栈也变得简单了。看起来，MongoDB 这个选择真是相当完美！但是，作为一个老程序员，第一次听到 MongoDB 这样“万能”的解决方案，我的第一反应是，“天底下哪有这样的好事”。所有的软件系统，都有它的适用场景，想通过一种解决方案适用三个差异非常大的应用场景，显然既不合理，又不现实。接下来，我们就来仔细看一下，这个“不合理”“不现实”在什么地方。上面我们已经讲过 DMP 的 KV 数据库期望的应用场景和性能要求了，这里我们就来看一下数据管道和数据仓库的性能取舍。<br><br><br>对于数据管道来说，我们需要的是高吞吐量，它的并发量虽然和 KV 数据库差不多，但是在响应时间上，要求就没有那么严格了，1-2 秒甚至再多几秒的延时都是可以接受的。而且，和 KV 数据库不太一样，数据管道的数据读写都是顺序读写，没有大量的随机读写的需求。<br><br><br>数据仓库就更不一样了，数据仓库的数据读取的量要比管道大得多。管道的数据读取就是我们当时写入的数据，一天有 10TB 日志数据，管道只会写入 10TB。下游的数据仓库存放数据和实时数据模块读取的数据，再加上个 2 倍的 10TB，也就是 20TB 也就够了。<br><br><br>但是，数据仓库的数据分析任务要读取的数据量就大多了。一方面，我们可能要分析一周、一个月乃至一个季度的数据。这一次分析要读取的数据可不是 10TB，而是 100TB 乃至 1PB。我们一天在数据仓库上跑的分析任务也不是 1 个，而是成千上万个，所以数据的读取量是巨大的。另一方面，我们存储在数据仓库里面的数据，也不像数据管道一样，存放几个小时、最多一天的数据，而是往往要存上 3 个月甚至是 1 年的数据。所以，我们需要的是 1PB 乃至 5PB 这样的存储空间。我把 KV 数据库、数据管道和数据仓库的应用场景，总结成了一个表格，放在这里。你可以对照着看一下，想想为什么 MongoDB 在这三个应用场景都不合适。<br><br><br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1603814338774-f9182041-39bb-42f1-8e84-6de7dcdaef2a.png#align=left&display=inline&height=1013&margin=%5Bobject%20Object%5D&name=image.png&originHeight=2026&originWidth=1906&size=735334&status=done&style=none&width=953" alt="image.png"><br>在 KV 数据库的场景下，需要支持高并发。那么 MongoDB 需要把更多的数据放在内存里面，但是这样我们的存储成本就会特别高了。<br><br><br>在数据管道的场景下，我们需要的是大量的顺序读写，而 MongoDB 则是一个文档数据库系统，并没有为顺序写入和吞吐量做过优化，看起来也不太适用。<br><br><br>而在数据仓库的场景下，主要的数据读取时顺序读取，并且需要海量的存储。MongoDB 这样的文档式数据库也没有为海量的顺序读做过优化，仍然不是一个最佳的解决方案。而且文档数据库里总是会有很多冗余的字段的元数据，还会浪费更多的存储空间。<br></p><p><a name="GOWRs"></a></p><h3 id="那我们该选择什么样的解决方案呢？"><a href="#那我们该选择什么样的解决方案呢？" class="headerlink" title="那我们该选择什么样的解决方案呢？"></a>那我们该选择什么样的解决方案呢？</h3><p>拿着我们的应用场景去找方案，其实并不难找。对于 KV 数据库，最佳的选择方案自然是使用 SSD 硬盘，选择 AeroSpike 这样的 KV 数据库。高并发的随机访问并不适合 HDD 的机械硬盘，而 400TB 的数据，如果用内存的话，成本又会显得太高。<br><br><br>对于数据管道，最佳选择自然是 Kafka。因为我们追求的是吞吐率，采用了 Zero-Copy 和 DMA 机制的 Kafka 最大化了作为数据管道的吞吐率。而且，数据管道的读写都是顺序读写，所以我们也不需要对随机读写提供支持，用上 HDD 硬盘就好了。<br><br><br>到了数据仓库，存放的数据量更大了。在硬件层面使用 HDD 硬盘成了一个必选项。否则，我们的存储成本就会差上 10 倍。这么大量的数据，在存储上我们需要定义清楚 Schema，使得每个字段都不需要额外存储元数据，能够通过 Avro/Thrift/ProtoBuffer 这样的二进制序列化的方存储下来，或者干脆直接使用 Hive 这样明确了字段定义的数据仓库产品。很明显，MongoDB 那样不限制 Schema 的数据结构，在这个情况下并不好用。<br></p><p><a name="PGrxO"></a></p><h3 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h3><p><br>好了，相信到这里，你应该对怎么从最基本的原理出发，来选择技术栈有些感觉了。你应该更多地从底层的存储系统的特性和原理去考虑问题。一旦能够从这个角度去考虑问题，那么你对各类新的技术项目和产品的公关稿，自然会有一定的免疫力了，而不会轻易根据商业公司的宣传来做技术选型了。<br><br><br>因为低延时、高并发、写少读多的 DMP 的 KV 数据库，最适合用 SSD 硬盘，并且采用专门的 KV 数据库是最合适的。可以选择AeroSpike，也可以用开源的 Cassandra 来提供服务。<br><br><br>对于数据管道，因为主要是顺序读和顺序写，所以我们不一定要选用 SSD 硬盘，而可以用 HDD 硬盘。不过，对于最大化吞吐量的需求，使用 zero-copy 和 DMA 是必不可少的，所以现在的数据管道的标准解决方案就是 Kafka 了。<br><br><br>对于数据仓库，我们通常是一次写入、多次读取。并且，由于存储的数据量很大，我们还要考虑成本问题。于是，一方面，我们会用 HDD 硬盘而不是 SSD 硬盘；另一方面，我们往往会预先给数据规定好 Schema，使得单条数据的序列化，不需要像存 JSON 或者 MongoDB 的 BSON 那样，存储冗余的字段名称这样的元数据。<br><br><br>所以，最常用的解决方案是，用 Hadoop 这样的集群，采用 Hive 这样的数据仓库系统，或者采用 Avro/Thrift/ProtoBuffer 这样的二进制序列化方案。在大型的 DMP 系统设计当中，我们需要根据各个应用场景面临的实际情况，选择不同的硬件和软件的组合，来作为整个系统中的不同组件。<br></p><p><a name="Ob00f"></a></p><h3 id="思考"><a href="#思考" class="headerlink" title="思考"></a>思考</h3><p>这一讲里，我们讲到了数据管道通常所使用的开源系统 Kafka，并且选择了使用机械硬盘。在 Kafka 的使用上，我们有没有必要使用 SSD 硬盘呢？如果用了 SSD 硬盘，又会带来哪些好处和坏处呢？<br></p><blockquote><p>从应用场景来看，SSD的确还没有太大必要。从数据量的角度，搞很多块HDD已经可以满足目前的高并发的需求了，kafka频繁擦写 ssd的寿命是一个问题，从成本上和使用寿命也比SSD更划算了。</p></blockquote><blockquote><p>但是SSD和几年前比已经便宜了很多了，而且在PCI-E接口普及的情况下，顺序读写速度比起HDD也能拉开差距了，所以逐步我们也看到业界开始直接用SSD来部署Kafka也变得比较常见了。</p></blockquote><!-- rebuild by neat -->]]></content>
    
    <summary type="html">
    
      
      
        &lt;!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --&gt;&lt;p&gt;&lt;a name=&quot;yh9UI&quot;&gt;&lt;/a&gt;&lt;/p&gt;&lt;h3 id=&quot;DMP：数据管理平台&quot;&gt;&lt;a href=&quot;#DMP：数据管理平台&quot; class=
      
    
    </summary>
    
    
      <category term="大数据" scheme="cpeixin.cn/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/"/>
    
    
      <category term="DMP" scheme="cpeixin.cn/tags/DMP/"/>
    
  </entry>
  
  <entry>
    <title>HBase中bloomfilter源码实现</title>
    <link href="cpeixin.cn/2020/10/03/HBase%E4%B8%ADbloomfilter%E6%BA%90%E7%A0%81%E5%AE%9E%E7%8E%B0/"/>
    <id>cpeixin.cn/2020/10/03/HBase%E4%B8%ADbloomfilter%E6%BA%90%E7%A0%81%E5%AE%9E%E7%8E%B0/</id>
    <published>2020-10-03T08:10:56.000Z</published>
    <updated>2020-10-07T08:11:45.114Z</updated>
    
    <content type="html"><![CDATA[<!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --><p><a name="iMuzZ"></a></p><h2 id="Bloom-filter最优的大小计算"><a href="#Bloom-filter最优的大小计算" class="headerlink" title="Bloom filter最优的大小计算"></a>Bloom filter最优的大小计算</h2><p>Bloom过滤器对插入到其中的元素的数量非常敏感。对于HBase来说，条目的数量取决于存储在列中的数据的大小。当前默认区域大小为256MB，因此条目计数~=256MB/（列的平均值大小）。尽管有这个经验法则，但是由于压缩，我们并没有有效的方法来计算压缩后的条目计数。因此，通常使用动态bloom过滤器来添加额外的空间，而不是允许错误率增长。<br><br><br>Bloom filter最优的大小计算公示为：bloom size m = -(n * ln(err) / (ln(2)^2) ~= n * ln(err) / ln(0.6185)</p><ul><li>m表示Bloom filter中的位数（bitSize）</li><li>n表示插入bloomfilter中的元素数（maxKeys）</li><li>k表示使用的哈希函数数（nbHash）</li><li>e表示Bloom所需的误报率（err）</li></ul><p>但且仅当k=m/n ln（2）时，误报概率最小。<br><strong><br></strong>Hbase中Bloom filter的设置是在创建列族时通过setBloomFilterType方法设定，Hbase支持ROW、ROWCOL、ROWPREFIX_FIXED_LENGTH三种类型的Bloom filter，创建列族时默认设置为ROW，对所有插入数据的rowkey写入到Bloom filter中。**<br><a name="7sOVz"></a></p><h2 id="HBase中布隆过滤器的实现"><a href="#HBase中布隆过滤器的实现" class="headerlink" title="HBase中布隆过滤器的实现"></a>HBase中布隆过滤器的实现</h2><p><a name="GaaNu"></a></p><h2 id="Bloom-filter接口实现"><a href="#Bloom-filter接口实现" class="headerlink" title="Bloom filter接口实现"></a>Bloom filter接口实现</h2><p>Hbase的布隆过滤器由父接口BloomFilterBase类定义，包含2个子继承接口BloomFilter和BloomFilterWriter。<br>BloomFilter负责读取、判断，BloomFilterWriter负责将数据写入布隆过滤器。<br><a name="Ck4Le"></a></p><h3 id="read类"><a href="#read类" class="headerlink" title="read类"></a>read类</h3><p>BloomFilter负责数据的读取判断其中定义了三个方法</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">&#x2F;&#x2F;检查所定义的keyCell是否包含</span><br><span class="line">boolean contains(Cell keyCell, ByteBuff bloom, BloomType type);</span><br><span class="line">boolean contains(byte[] buf, int offset, int length, ByteBuff bloom);</span><br><span class="line"> </span><br><span class="line">&#x2F;&#x2F;是否允许Bloom filter自动load数据，默认实现为true</span><br><span class="line">boolean supportsAutoLoading();</span><br></pre></td></tr></table></figure><p><br>BloomFilter最终的实现类是CompoundBloomFilter类，CompoundBloomFilter的核心方法是contains方法。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line">public boolean contains(Cell keyCell, ByteBuff bloom, BloomType type) &#123;</span><br><span class="line">  </span><br><span class="line">  &#x2F;&#x2F;如果根索引不包含keyCell，返回false，根索引在Hfile创建时构建，不是对所有rowkey</span><br><span class="line">  int block &#x3D; index.rootBlockContainingKey(keyCell);</span><br><span class="line">  if (block &lt; 0) &#123;</span><br><span class="line">    return false; &#x2F;&#x2F; This key is not in the file.</span><br><span class="line">  &#125;</span><br><span class="line">  boolean result;</span><br><span class="line"> </span><br><span class="line">  &#x2F;&#x2F;获得Bloom的Block</span><br><span class="line">  HFileBlock bloomBlock &#x3D; getBloomBlock(block);</span><br><span class="line">  try &#123;</span><br><span class="line">    ByteBuff bloomBuf &#x3D; bloomBlock.getBufferReadOnly();</span><br><span class="line">   </span><br><span class="line">  &#x2F;&#x2F;通过BloomFilterUtil的contains方法判断</span><br><span class="line"> </span><br><span class="line">    result &#x3D; BloomFilterUtil.contains(keyCell, bloomBuf, bloomBlock.headerSize(),</span><br><span class="line">        bloomBlock.getUncompressedSizeWithoutHeader(), hash, hashCount, type);</span><br><span class="line">  &#125; finally &#123;</span><br><span class="line">    &#x2F;&#x2F; After the use return back the block if it was served from a cache.</span><br><span class="line">    reader.returnBlock(bloomBlock);</span><br><span class="line">  &#125;</span><br><span class="line">  if (numPositivesPerChunk !&#x3D; null &amp;&amp; result) &#123;</span><br><span class="line">    &#x2F;&#x2F; Update statistics. Only used in unit tests.</span><br><span class="line">    ++numPositivesPerChunk[block];</span><br><span class="line">  &#125;</span><br><span class="line">  return result;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><br>BloomFilterUtil.contains方法中，通过不同的 BloomType，构建不同的BloomHashKey，然后读取bloomBuf中的bitvals，计算cell<br>对应类型的HashKey，判断在bitvals中是否为1.</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">public static boolean contains(Cell cell, ByteBuff bloomBuf, int bloomOffset, int bloomSize,</span><br><span class="line">    Hash hash, int hashCount, BloomType type) &#123;</span><br><span class="line">  HashKey&lt;Cell&gt; hashKey &#x3D; type &#x3D;&#x3D; BloomType.ROWCOL ? new RowColBloomHashKey(cell)</span><br><span class="line">      : new RowBloomHashKey(cell);</span><br><span class="line"></span><br><span class="line">&#x2F;&#x2F;最终的判断方法，实现还是比较简单</span><br><span class="line">  return contains(bloomBuf, bloomOffset, bloomSize, hash, hashCount, hashKey);</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><a name="E1i1v"></a></p><h3 id="write类"><a href="#write类" class="headerlink" title="write类"></a>write类</h3><p>BloomFilterWriter类定义了一些写入方法</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"> </span><br><span class="line">&#x2F;&#x2F;在数据写入磁盘之前，压缩Bloom filter  </span><br><span class="line">void compactBloom();</span><br><span class="line"> </span><br><span class="line">&#x2F;&#x2F;获得一个meta data 的Writer，写入Bloom TYpe 、数据大小等</span><br><span class="line">Writable getMetaWriter();</span><br><span class="line"> </span><br><span class="line">&#x2F;&#x2F;获取一个 Bloom bits 的Writer</span><br><span class="line">Writable getDataWriter();</span><br><span class="line"> </span><br><span class="line">&#x2F;&#x2F; previous cell written</span><br><span class="line">Cell getPrevCell();</span><br></pre></td></tr></table></figure><p><br>BloomFilterWriter的最终实现类为CompoundBloomFilterWriter。子类CompoundBloomFilterWriter的核心方法是append方法，负责将添加的数据写入到BloomFilter</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br></pre></td><td class="code"><pre><span class="line">@Override</span><br><span class="line">public void append(Cell cell) throws IOException &#123;</span><br><span class="line">  if (cell &#x3D;&#x3D; null)</span><br><span class="line">    throw new NullPointerException();</span><br><span class="line"> </span><br><span class="line">  enqueueReadyChunk(false);</span><br><span class="line"> </span><br><span class="line">  if (chunk &#x3D;&#x3D; null) &#123;</span><br><span class="line">    if (firstKeyInChunk !&#x3D; null) &#123;</span><br><span class="line">      throw new IllegalStateException(&quot;First key in chunk already set: &quot;</span><br><span class="line">          + Bytes.toStringBinary(firstKeyInChunk));</span><br><span class="line">    &#125;</span><br><span class="line"> </span><br><span class="line">   &#x2F;&#x2F;第一添加时需要allocateNewChunk，Chunk动态添加，完成hash等</span><br><span class="line">    &#x2F;&#x2F; This will be done only once per chunk</span><br><span class="line">    if (bloomType &#x3D;&#x3D; BloomType.ROWCOL) &#123;</span><br><span class="line">      firstKeyInChunk &#x3D;</span><br><span class="line">          PrivateCellUtil</span><br><span class="line">              .getCellKeySerializedAsKeyValueKey(PrivateCellUtil.createFirstOnRowCol(cell));</span><br><span class="line">    &#125; else &#123;</span><br><span class="line">      firstKeyInChunk &#x3D; CellUtil.copyRow(cell);</span><br><span class="line">    &#125;</span><br><span class="line">    allocateNewChunk();</span><br><span class="line">  &#125;</span><br><span class="line"> </span><br><span class="line">  &#x2F;&#x2F;Chunk 实现具体的hash计算，和bit置位</span><br><span class="line">  chunk.add(cell);</span><br><span class="line">  this.prevCell &#x3D; cell;</span><br><span class="line">  ++totalKeyCount;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><br>BloomFilterChunk继承自BloomFilterBase，实现了BloomFilter的读写，其中add方法实现写入</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line">  public void add(Cell cell) &#123;</span><br><span class="line">    &#x2F;*</span><br><span class="line">     * For faster hashing, use combinatorial generation</span><br><span class="line">     * http:&#x2F;&#x2F;www.eecs.harvard.edu&#x2F;~kirsch&#x2F;pubs&#x2F;bbbf&#x2F;esa06.pdf</span><br><span class="line">     *&#x2F;</span><br><span class="line">    int hash1;</span><br><span class="line">    int hash2;</span><br><span class="line">    HashKey&lt;Cell&gt; hashKey;</span><br><span class="line">    </span><br><span class="line">&#x2F;&#x2F; 计算2次hash 写入位</span><br><span class="line"> </span><br><span class="line">  if (this.bloomType &#x3D;&#x3D; BloomType.ROWCOL) &#123;</span><br><span class="line">      hashKey &#x3D; new RowColBloomHashKey(cell);</span><br><span class="line">      hash1 &#x3D; this.hash.hash(hashKey, 0);</span><br><span class="line">      hash2 &#x3D; this.hash.hash(hashKey, hash1);</span><br><span class="line">    &#125; else &#123;</span><br><span class="line">      hashKey &#x3D; new RowBloomHashKey(cell);</span><br><span class="line">      hash1 &#x3D; this.hash.hash(hashKey, 0);</span><br><span class="line">      hash2 &#x3D; this.hash.hash(hashKey, hash1);</span><br><span class="line">    &#125;</span><br><span class="line">    setHashLoc(hash1, hash2);</span><br><span class="line">  &#125;</span><br></pre></td></tr></table></figure><p><br>get方法完成数据查询，和之前BloomFilterUtil.contains方法一致</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">static boolean get(int pos, ByteBuffer bloomBuf, int bloomOffset) &#123;</span><br><span class="line">  </span><br><span class="line">  &#x2F;&#x2F;实现位查找</span><br><span class="line">  int bytePos &#x3D; pos &gt;&gt; 3; &#x2F;&#x2F;pos &#x2F; 8</span><br><span class="line">  int bitPos &#x3D; pos &amp; 0x7; &#x2F;&#x2F;pos % 8</span><br><span class="line">  &#x2F;&#x2F; TODO access this via Util API which can do Unsafe access if possible(?)</span><br><span class="line">  byte curByte &#x3D; bloomBuf.get(bloomOffset + bytePos);</span><br><span class="line">  curByte &amp;&#x3D; BloomFilterUtil.bitvals[bitPos];</span><br><span class="line">  return (curByte !&#x3D; 0);</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><a name="Dy7rT"></a></p><h2 id="布隆过滤器的使用"><a href="#布隆过滤器的使用" class="headerlink" title="布隆过滤器的使用"></a>布隆过滤器的使用</h2><p>对于之前创建的布隆过滤器的使用，hbase中体现在两个地方，一个是构建scannner时，判断scanner的是否包含所需要的数据列或者列族<br><br><br>在构建StoreFileScanner时，会通过shouldUseScanner方法判断，时都用到当前Scanner，其中用到了reader.passesBloomFilter的方法。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">public boolean shouldUseScanner(Scan scan, HStore store, long oldestUnexpiredTS) &#123;</span><br><span class="line">  &#x2F;&#x2F; if the file has no entries, no need to validate or create a scanner.</span><br><span class="line">  byte[] cf &#x3D; store.getColumnFamilyDescriptor().getName();</span><br><span class="line">  TimeRange timeRange &#x3D; scan.getColumnFamilyTimeRange().get(cf);</span><br><span class="line">  if (timeRange &#x3D;&#x3D; null) &#123;</span><br><span class="line">    timeRange &#x3D; scan.getTimeRange();</span><br><span class="line">  &#125;</span><br><span class="line"> </span><br><span class="line"> &#x2F;&#x2F;从时间范围、startkey和endkey范围、bloomfilter判断</span><br><span class="line">  return reader.passesTimerangeFilter(timeRange, oldestUnexpiredTS) &amp;&amp; reader</span><br><span class="line">      .passesKeyRangeFilter(scan) &amp;&amp; reader.passesBloomFilter(scan, scan.getFamilyMap().get(cf));</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><br>StoreFileReader在创建StoreFileScanner的时候创建，主要用来读取hfile文件。 passesBloomFilter方法当前Hfile的bloomFilter的类型，构建具体的bloomFilter。bloomFilter的类型是创建表时，列族中定义的。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line">boolean passesBloomFilter(Scan scan, final SortedSet&lt;byte[]&gt; columns) &#123;</span><br><span class="line">  byte[] row &#x3D; scan.getStartRow();</span><br><span class="line">  switch (this.bloomFilterType) &#123;</span><br><span class="line">    case ROW:</span><br><span class="line">      if (!scan.isGetScan()) &#123;</span><br><span class="line">        return true;</span><br><span class="line">      &#125;</span><br><span class="line">      return passesGeneralRowBloomFilter(row, 0, row.length);</span><br><span class="line"> </span><br><span class="line">    case ROWCOL:</span><br><span class="line">      if (!scan.isGetScan()) &#123;</span><br><span class="line">        return true;</span><br><span class="line">      &#125;</span><br><span class="line">      if (columns !&#x3D; null &amp;&amp; columns.size() &#x3D;&#x3D; 1) &#123;</span><br><span class="line">        byte[] column &#x3D; columns.first();</span><br><span class="line">        &#x2F;&#x2F; create the required fake key</span><br><span class="line">        Cell kvKey &#x3D; PrivateCellUtil.createFirstOnRow(row, HConstants.EMPTY_BYTE_ARRAY, column);</span><br><span class="line">        return passesGeneralRowColBloomFilter(kvKey);</span><br><span class="line">      &#125;</span><br><span class="line"> </span><br><span class="line">      &#x2F;&#x2F; For multi-column queries the Bloom filter is checked from the</span><br><span class="line">      &#x2F;&#x2F; seekExact operation.</span><br><span class="line">      return true;</span><br><span class="line">    case ROWPREFIX_FIXED_LENGTH:</span><br><span class="line">      return passesGeneralRowPrefixBloomFilter(scan);</span><br><span class="line">    default:</span><br><span class="line">      return true;</span><br><span class="line">  &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>passesGeneralRowBloomFilter方法中this.generalBloomFilter是创建reader时构建的BloomFilter。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line">private boolean passesGeneralRowBloomFilter(byte[] row, int rowOffset, int rowLen) &#123;</span><br><span class="line">  BloomFilter bloomFilter &#x3D; this.generalBloomFilter;</span><br><span class="line">  if (bloomFilter &#x3D;&#x3D; null) &#123;</span><br><span class="line">    return true;</span><br><span class="line">  &#125;</span><br><span class="line"> </span><br><span class="line">  &#x2F;&#x2F; Used in ROW bloom</span><br><span class="line">  byte[] key &#x3D; null;</span><br><span class="line">  if (rowOffset !&#x3D; 0 || rowLen !&#x3D; row.length) &#123;</span><br><span class="line">    throw new AssertionError(</span><br><span class="line">        &quot;For row-only Bloom filters the row must occupy the whole array&quot;);</span><br><span class="line">  &#125;</span><br><span class="line">  key &#x3D; row;</span><br><span class="line"> </span><br><span class="line"> &#x2F;&#x2F;判断row是否在本hfile中</span><br><span class="line">  return checkGeneralBloomFilter(key, null, bloomFilter);</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>checkGeneralBloomFilter方法中调用contains完成最终的判断。</p><!-- rebuild by neat -->]]></content>
    
    <summary type="html">
    
      
      
        &lt;!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --&gt;&lt;p&gt;&lt;a name=&quot;iMuzZ&quot;&gt;&lt;/a&gt;&lt;/p&gt;&lt;h2 id=&quot;Bloom-filter最优的大小计算&quot;&gt;&lt;a href=&quot;#Bloom-fil
      
    
    </summary>
    
    
      <category term="大数据" scheme="cpeixin.cn/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/"/>
    
    
      <category term="HBase" scheme="cpeixin.cn/tags/HBase/"/>
    
  </entry>
  
  <entry>
    <title>布隆过滤器在HBase中的应用</title>
    <link href="cpeixin.cn/2020/09/30/%E5%B8%83%E9%9A%86%E8%BF%87%E6%BB%A4%E5%99%A8%E5%9C%A8HBase%E4%B8%AD%E7%9A%84%E5%BA%94%E7%94%A8/"/>
    <id>cpeixin.cn/2020/09/30/%E5%B8%83%E9%9A%86%E8%BF%87%E6%BB%A4%E5%99%A8%E5%9C%A8HBase%E4%B8%AD%E7%9A%84%E5%BA%94%E7%94%A8/</id>
    <published>2020-09-30T07:18:58.000Z</published>
    <updated>2020-10-07T12:38:25.330Z</updated>
    
    <content type="html"><![CDATA[<!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --><h1 id="布隆过滤器在HBase中的应用"><a href="#布隆过滤器在HBase中的应用" class="headerlink" title="布隆过滤器在HBase中的应用"></a>布隆过滤器在HBase中的应用</h1><p><a name="e4bXj"></a></p><h3 id="块索引机制"><a href="#块索引机制" class="headerlink" title="块索引机制"></a>块索引机制</h3><p>在讨论布隆过滤器在HBase中的应用之前，先介绍一下HBase的块索引机制。块索引是HBase固有的一个特性，因为HBase的底层数据是存储在HFile中的，而每个HFile中存储的是有序的&lt;key, value&gt;键值对，<strong>HFile文件内部由连续的块组成，每个块中存储的第一行数据的行键组成了这个文件的块索引，这些块索引信息存储在文件尾部。当HBase打开一个HFile时，块索引信息会优先加载到内存；HBase首先在内存的块索引中进行二分查找，确定可能包含给定键的块，然后读取磁盘块找到实际想要的键</strong>。</p><blockquote><p>注意这里的块不是HDFS的块，HBase块的默认大小是64KB。可以根据需要配置不同的大小，对于顺序访问较多的表，建议使用较大的块；随机访问较多的表，建议使用较小的块。</p></blockquote><p>但实际应用中，仅仅只有块索引满足不了需求，这是因为，块索引能帮助我们更快地在一个文件中找到想要的数据，但是我们可能依然需要扫描很多文件。而布隆过滤器就是为解决这个问题而生。因为布隆过滤器的作用是，用户可以立即判断一个文件是否包含特定的行键，从而帮我们过滤掉一些不需要扫描的文件。如下图所示，块索引显示每个文件中都可能包含对应的行键，而布隆过滤器能帮我们跳过一些明显不包含对应行键的文件。<br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1602052239866-46660810-503f-42d1-bb0c-ade99dc6e519.png#align=left&display=inline&height=565&margin=%5Bobject%20Object%5D&originHeight=565&originWidth=919&size=0&status=done&style=none&width=919" alt><br>如上图所示，布隆过滤器不能明确指出哪一个文件一定包含所查找的行键，布隆过滤器的结果有误差存在。当布隆过滤器判断文件中不包含对应的行时，这个答案是绝对正确的；但是，当布隆过滤器判断得到文件中包含对应行时，这个答案却有可能是错的。也就是说，HBase还是有可能加载了不必要的块。尽管如此，布隆过滤器还是可以帮助我们跳过一些明显不需要扫描的文件。另外，错误率可以通过调整布隆过滤器所占空间的大小来调整，通常设置错误率为1%。<br><br><br>需要注意的是，使用布隆过滤器，并不一定能立即提升个别的get操作性能，因为同一时间可能有多个客户端向HBase发送请求，当负载过大时，HBase的性能受限于读磁盘的效率。但是，使用了布隆过滤器之后，可以减少不必要的块加载，从而可以提高整个集群的吞吐率。并且，因为HBase加载的块数量少了，缓存波动也降低了，进而提高了读缓存的命中率。<br><br><br>然而，相比与布隆过滤器的优点，它的缺点显得如此微不足道，就是需要占用少量的存储空间。在使用布隆过滤器时，需要注意两个问题<br></p><p><a name="5lpPd"></a></p><h3 id="什么时候应该使用布隆过滤器"><a href="#什么时候应该使用布隆过滤器" class="headerlink" title="什么时候应该使用布隆过滤器"></a>什么时候应该使用布隆过滤器</h3><p>根据上面的描述，布隆过滤器的主要作用，是帮助HBase跳过那些显然不包括所查找数据的底层文件。那么，当所查找的数据均匀分布在所有文件中（当用户定期更新所有行时，就可能导致这种情况），布隆过滤器的作用就微乎其微，反而浪费了存储空间。相反，如果我们查找的数据只包含在少部分的文件中，就应该果断使用布隆过滤器。<br></p><p><a name="bz1Ok"></a></p><h3 id="选择行级还是行加列级布隆过滤器"><a href="#选择行级还是行加列级布隆过滤器" class="headerlink" title="选择行级还是行加列级布隆过滤器"></a>选择行级还是行加列级布隆过滤器</h3><p>布隆过滤器是hbase中的高级功能，它能够减少特定访问模式（get/scan）下的查询时间。不过由于这种模式增加了内存和存储的负担，所以被默认为关闭状态。</p><p>hbase支持如下类型的布隆过滤器：<br>1、NONE 不使用布隆过滤器<br>2、ROW 行键使用布隆过滤器<br>3、ROWCOL 列键使用布隆过滤器<br>其中ROWCOL是粒度更细的模式。</p><p>很显然，行加列级因为粒度更细，占用的存储空间也就越多。因此，如果用户总是读取整行的数据，行级布隆过滤器就够用了。在可以选择的情形下，尽可能使用行级布隆过滤器，因为它在额外的空间开销和利用过滤存储文件提升性能之间取得了更好的平衡。</p><p>例如：ROW 使用场景，假设有2个Hfile文件hf1和hf2， hf1包含kv1（r1 cf:q1 v）、kv2（r2 cf:q1 v） hf2包含kv3（r3 cf:q1 v）、kv4（r4 cf:q1 v） 如果设置了CF属性中的bloomfilter（布隆过滤器）为ROW，那么get(r1)时就会过滤hf2，get(r3)就会过滤hf1 。</p><p>ROWCOL使用场景，假设有2个Hfile文件hf1和hf2， hf1包含kv1（r1 cf:q1 v）、kv2（r2 cf:q1 v） hf2包含kv3（r1 cf:q2 v）、kv4（r2 cf:q2 v） 如果设置了CF属性中的bloomfilter为ROW，无论get(r1,q1)还是get(r1,q2)，都会读取hf1+hf2；而如果设置了CF属性中的bloomfilter为ROWCOL，那么get(r1,q1)就会过滤hf2，get(r1,q2)就会过滤hf1。<br><code>注意：ROW和ROWCOL只是名字上有联系，但是ROWCOL并不是ROW的扩展，也不能取代ROW</code><br></p><p><a name="wa7Lo"></a></p><h3 id="布隆过滤器的存储在哪"><a href="#布隆过滤器的存储在哪" class="headerlink" title="布隆过滤器的存储在哪"></a>布隆过滤器的存储在哪</h3><p>对于hbase而言，当我们选择采用布隆过滤器之后，HBase会在生成StoreFile（HFile）时包含一份布隆过滤器结构的数据，称其为MetaBlock；MetaBlock与DataBlock（真实的KeyValue数据）一起由LRUBlockCache维护。所以，开启bloomfilter会有一定的存储及内存cache开销。但是在大多数情况下，这些负担相对于布隆过滤器带来的好处是可以接受的。<br><a name="f18ob"></a></p><h4><a href="#" class="headerlink"></a></h4><p><a name="Dig9X"></a></p><h3 id="采用布隆过滤器后，hbase如何get数据"><a href="#采用布隆过滤器后，hbase如何get数据" class="headerlink" title="采用布隆过滤器后，hbase如何get数据"></a>采用布隆过滤器后，hbase如何get数据</h3><p>在读取数据时，hbase会首先在布隆过滤器中查询，根据布隆过滤器的结果，再在MemStore中查询，最后再在对应的HFile中查询。</p><p><a name="2I3Ml"></a></p><h3 id="采用布隆过滤器后，hbase如何Scan数据"><a href="#采用布隆过滤器后，hbase如何Scan数据" class="headerlink" title="采用布隆过滤器后，hbase如何Scan数据"></a>采用布隆过滤器后，hbase如何Scan数据</h3><p>get操作会enable bloomfilter帮助剔除掉不会用到的Storefile,在scan初始化时（get会包装为scan）对于每个storefile会做shouldSeek的检查，如果返回false，则表明该storefile里没有要找的内容，直接跳过</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">if (memOnly &#x3D;&#x3D; false    </span><br><span class="line">            &amp;&amp; ((StoreFileScanner) kvs).shouldSeek(scan, columns)) &#123;    </span><br><span class="line">          scanners.add(kvs);    </span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>shouldSeek方法：如果是scan直接返回true表明不能跳过，然后根据bloomfilter类型检查。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line">if (!scan.isGetScan()) &#123;    </span><br><span class="line">        return true;    </span><br><span class="line">&#125;    </span><br><span class="line">byte[] row &#x3D; scan.getStartRow();    </span><br><span class="line">switch (this.bloomFilterType) &#123;    </span><br><span class="line">  case ROW:    </span><br><span class="line">    return passesBloomFilter(row, 0, row.length, null, 0, 0);    </span><br><span class="line">   </span><br><span class="line">  case ROWCOL:    </span><br><span class="line">    if (columns !&#x3D; null &amp;&amp; columns.size() &#x3D;&#x3D; 1) &#123;    </span><br><span class="line">      byte[] column &#x3D; columns.first();    </span><br><span class="line">      return passesBloomFilter(row, 0, row.length, column, 0, column.length);    </span><br><span class="line">    &#125;    </span><br><span class="line">    &#x2F;&#x2F; For multi-column queries the Bloom filter is checked from the    </span><br><span class="line">    &#x2F;&#x2F; seekExact operation.    </span><br><span class="line">    return true;    </span><br><span class="line">   </span><br><span class="line">  default:    </span><br><span class="line">    return true;  </span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>指明qualified的scan在配了rowcol的情况下会剔除不会用掉的StoreFile。<br>对指明了qualify的scan或者get进行检查：seekExactly</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">&#x2F;&#x2F; Seek all scanners to the start of the Row (or if the exact matching row    </span><br><span class="line">&#x2F;&#x2F; key does not exist, then to the start of the next matching Row).    </span><br><span class="line">if (matcher.isExactColumnQuery()) &#123;    </span><br><span class="line">  for (KeyValueScanner scanner : scanners)    </span><br><span class="line">  scanner.seekExactly(matcher.getStartKey(), false);    </span><br><span class="line">&#125; else &#123;    </span><br><span class="line">  for (KeyValueScanner scanner : scanners)    </span><br><span class="line">  scanner.seek(matcher.getStartKey());    </span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>如果bloomfilter没命中，则创建一个很大的假的keyvalue，表明该storefile不需要实际的scan</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line">public boolean seekExactly(KeyValue kv, boolean forward)    </span><br><span class="line">      throws IOException &#123;    </span><br><span class="line">    if (reader.getBloomFilterType() !&#x3D; StoreFile.BloomType.ROWCOL ||    </span><br><span class="line">        kv.getRowLength() &#x3D;&#x3D; 0 || kv.getQualifierLength() &#x3D;&#x3D; 0) &#123;    </span><br><span class="line">      return forward ? reseek(kv) : seek(kv);    </span><br><span class="line">    &#125;    </span><br><span class="line">    </span><br><span class="line">    boolean isInBloom &#x3D; reader.passesBloomFilter(kv.getBuffer(),    </span><br><span class="line">        kv.getRowOffset(), kv.getRowLength(), kv.getBuffer(),    </span><br><span class="line">        kv.getQualifierOffset(), kv.getQualifierLength());    </span><br><span class="line">    if (isInBloom) &#123;    </span><br><span class="line">      &#x2F;&#x2F; This row&#x2F;column might be in this store file. Do a normal seek.    </span><br><span class="line">      return forward ? reseek(kv) : seek(kv);    </span><br><span class="line">    &#125;    </span><br><span class="line">    </span><br><span class="line">    &#x2F;&#x2F; Create a fake key&#x2F;value, so that this scanner only bubbles up to the top    </span><br><span class="line">    &#x2F;&#x2F; of the KeyValueHeap in StoreScanner after we scanned this row&#x2F;column in    </span><br><span class="line">    &#x2F;&#x2F; all other store files. The query matcher will then just skip this fake    </span><br><span class="line">    &#x2F;&#x2F; key&#x2F;value and the store scanner will progress to the next column.    </span><br><span class="line">    cur &#x3D; kv.createLastOnRowCol();    </span><br><span class="line">    return true;    </span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>这边为什么是rowcol才能剔除storefile纳，很简单，scan是一个范围，如果是row的bloomfilter不命中只能说明该rowkey不在此storefile中，但next rowkey可能在。<br><br><br>而rowcol的bloomfilter就不一样了，如果rowcol的bloomfilter没有命中表明该qualifiy不在这个storefile中，因此这次scan就不需要scan此storefile了！<br><br><br>1.任何类型的get（基于rowkey和基于row+col）bloomfilter都能生效，关键是get的类型要匹配bloomfilter的类型<br>2.基于row的scan是没办法优化的<br>3.row+col+qualify的scan可以去掉不存在此qualify的storefile，也算是不错的优化了，而且指明qualify也能减少流量，因此scan尽量指明qualify。<br></p><!-- rebuild by neat -->]]></content>
    
    <summary type="html">
    
      
      
        &lt;!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --&gt;&lt;h1 id=&quot;布隆过滤器在HBase中的应用&quot;&gt;&lt;a href=&quot;#布隆过滤器在HBase中的应用&quot; class=&quot;headerlink&quot; titl
      
    
    </summary>
    
    
      <category term="大数据" scheme="cpeixin.cn/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/"/>
    
    
      <category term="HBase" scheme="cpeixin.cn/tags/HBase/"/>
    
  </entry>
  
  <entry>
    <title>探索B+树和LSM-Tree</title>
    <link href="cpeixin.cn/2020/09/27/%E6%8E%A2%E7%B4%A2B-%E6%A0%91%E5%92%8CLSM-Tree/"/>
    <id>cpeixin.cn/2020/09/27/%E6%8E%A2%E7%B4%A2B-%E6%A0%91%E5%92%8CLSM-Tree/</id>
    <published>2020-09-27T03:07:25.000Z</published>
    <updated>2020-10-07T03:08:50.702Z</updated>
    
    <content type="html"><![CDATA[<!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --><p>今天来了解下关系型数据库MySQL和NoSQL存储引擎HBase的底层存储机制。对于一个数据库的性能来说，其数据的组织方式至关重要。众所周知，数据库的数据大多存储在磁盘上，而磁盘的访问相对内存的访问来说是一项很耗时的操作，对比如下。因此，提高数据库数据的查找速度的关键点之一便是尽量减少磁盘的访问次数。<br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1602035944493-a6384c4e-12c8-4453-96b0-0c2be1a78eef.png#align=left&display=inline&height=512&margin=%5Bobject%20Object%5D&name=image.png&originHeight=512&originWidth=640&size=158633&status=done&style=none&width=640" alt="image.png"><br>为了加速数据库数据的访问，大多传统的关系型数据库都会使用特殊的数据结构来帮助查找数据，这种数据结构叫作索引（ Index）。对于传统的关系型数据库，考虑到经常需要范围查找某一批数据，因此其索引一般不使用 Hash算法，而使用树（ Tree）结构。然而，树结构的种类很多，却不一定都适合用于做数据库索引。<br></p><p><a name="AoiGD"></a></p><h3 id="二叉查找树与平衡二叉树"><a href="#二叉查找树与平衡二叉树" class="headerlink" title="二叉查找树与平衡二叉树"></a>二叉查找树与平衡二叉树</h3><p>最常见的树结构是二叉查找树（ Binary Search Tree），它就是一棵二叉有序树：保证左子树上所有节点的值都小于根节点的值，而右子树上所有节点的值都大于根节点的值。其优点在于实现简单，并且树在平衡的状态下查找效率能达到 O(log n)；缺点是在极端非平衡情况下查找效率会退化到 O(n)，因此很难保证索引的效率。<br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1602036281601-3cf48849-6485-4028-9d6b-dc3a9cfcbb5e.png#align=left&display=inline&height=325&margin=%5Bobject%20Object%5D&name=image.png&originHeight=342&originWidth=640&size=102079&status=done&style=none&width=609" alt="image.png"><br><br><br>针对上述二叉查找树的缺点，人们很自然就想到是否能用平衡二叉树（ Balanced Binary Tree）来解决这个问题。但是平衡二叉树依然有个比较大的问题：它的树高为 log n——对于索引树来说，树的高度越高，意味着查找所要花费的访问次数越多，查询效率越低。<br><br><br>况且，主存从磁盘读数据一般以页为单位，因此每次访问磁盘都会读取多个扇区的数据（比如 4KB大小的数据），远大于单个二叉树节点的值（字节级别），这也是造成二叉树相对索引树效率低下的原因。正因如此，人们就想到了通过增加每个树节点的度来提高访问效率，而 B+树（B+-tree）便受到了更多的关注。<br><br><br>B+树<br>在传统的关系型数据库里， B+树（ B+-tree）及其衍生树是被用得比较多的索引树。<br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1602036521833-086f5378-40c3-4c35-b0dc-4fba25763d44.png#align=left&display=inline&height=172&margin=%5Bobject%20Object%5D&name=image.png&originHeight=344&originWidth=640&size=163098&status=done&style=none&width=320" alt="image.png"><br>B+树的主要特点如下。<br></p><ul><li>每个树节点只存放键值，不存放数值，而由叶子节点存放数值。这样会使树节点的度比较大，而树的高度就比较低，从而有利于提高查询效率。<br></li><li>叶子节点存放数值，并按照值大小顺序排序，且带指向相邻节点的指针，以便高效地进行区间数据查询；并且所有叶子节点与根节点的距离相同，因此任何查询的效率都很相似。</li><li>与二叉树不同， B+树的数据更新操作不从根节点开始，而从叶子节点开始，并且在更新过程中树能以比较小的代价实现自平衡。<br></li></ul><p>正是由于 B+树的上述优点，它成了传统关系型数据库的宠儿。当然，它也并非无懈可击，我们总在讨论B+树的有点，下面来说说缺点，它的主要缺点在于随着数据插入的不断发生，叶子节点会慢慢分裂——这可能会导致逻辑上原本连续的数据实际上存放在不同的物理磁盘块位置上，在做范围查询的时候会导致较高的磁盘 IO，以致严重影响到性能。<br></p><p><a name="rjOrM"></a></p><h3 id="日志结构合并树"><a href="#日志结构合并树" class="headerlink" title="日志结构合并树"></a>日志结构合并树</h3><p>众所周知，数据库的数据大多存储在磁盘上，而无论是传统的机械硬盘（ HardDiskDrive, HDD）还是固态硬盘（ Solid State Drive, SSD），对磁盘数据的顺序读写速度都远高于随机读写。<br><br><br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1602036755343-063a9e8c-37ca-42f1-8a3b-1c5332a878a2.png#align=left&display=inline&height=241&margin=%5Bobject%20Object%5D&name=image.png&originHeight=482&originWidth=640&size=134775&status=done&style=none&width=320" alt="image.png"><br>然而，基于 B+树的索引结构是违背上述磁盘基本特点的——它会需要较多的磁盘随机读写，于是， 1992年，名为日志结构（ Log-Structured）的新型索引结构方法便应运而生。日志结构方法的主要思想是将磁盘看作一个大的日志，每次都将新的数据及其索引结构添加到日志的最末端，以实现对磁盘的顺序操作，从而提高索引性能。不过，日志结构方法也有明显的缺点，随机读取数据时效率很低。<br><br><br>1996年，一篇名为 Thelog-structured merge-tree（LSM-tree）的论文创造性地提出了日志结构合并树（ Log-Structured Merge-Tree）的概念，该方法既吸收了日志结构方法的优点，又通过将数据文件预排序克服了日志结构方法随机读性能较差的问题。尽管当时 LSM-tree新颖且优势鲜明，但它真正声名鹊起却是在 10年之后的 2006年，那年谷歌的一篇使用了 LSM-tree技术的论文 Bigtable: A Distributed Storage System for Structured Data横空出世，在分布式数据处理领域掀起了一阵旋风，随后两个声名赫赫的大数据开源组件（ 2007年的 <strong>HBase</strong>与 2008年的 Cassandra，目前两者同为 Apache顶级项目）直接在其思想基础上破茧而出，彻底改变了大数据基础组件的格局，同时也极大地推广了 LSM-tree技术。<br><br><br>LSM-tree最大的特点是同时使用了两部分类树的数据结构来存储数据，并同时提供查询。其中一部分数据结构（ C0树）存在于内存缓存（通常叫作 memtable）中，负责接受新的数据插入更新以及读请求，并直接在内存中对数据进行排序；另一部分数据结构（ C1树）存在于硬盘上 (这部分通常叫作 sstable)，它们是由存在于内存缓存中的 C0树冲写到磁盘而成的，主要负责提供读操作，特点是有序且不可被更改。<br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1602036914181-a0a780fc-339e-4bec-b7ec-1659f90186fa.png#align=left&display=inline&height=108&margin=%5Bobject%20Object%5D&name=image.png&originHeight=216&originWidth=640&size=29947&status=done&style=none&width=320" alt="image.png"><br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1602036901592-47c98515-f5dd-4527-b553-b30348251581.png#align=left&display=inline&height=956&margin=%5Bobject%20Object%5D&name=image.png&originHeight=1911&originWidth=4411&size=854644&status=done&style=none&width=2205.5" alt="image.png"><br>LSM-tree的另一大特点是除了使用两部分类树的数据结构外，还会使用日志文件（通常叫作 commit log）来为数据恢复做保障。这三类数据结构的协作顺序一般是：所有的新插入与更新操作都首先被记录到 commit log中——该操作叫作 WAL（Write Ahead Log），然后再写到 memtable，最后当达到一定条件时数据会从 memtable冲写到 sstable，并抛弃相关的 log数据；memtable与 sstable可同时供查询；当 memtable出问题时，可从 commit log与 sstable中将 memtable的数据恢复。<br><br><br>我们可以参考 HBase的架构来体会其架构中基于 LSM-tree的部分特点。按照 WAL的原则，数据首先会写到 HBase的 HLog(相当于 commit log)里，然后再写到 MemStore（相当于 memtable）里，最后会冲写到磁盘 StoreFile（相当于 sstable）中。这样 HBase的 HRegionServer就通过 LSM-tree实现了数据文件的生成。HBase LSM-tree架构示意图如下图。<br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1602037777657-b0d08b88-d167-40bc-8992-0e2ff8550550.png#align=left&display=inline&height=385&margin=%5Bobject%20Object%5D&name=image.png&originHeight=385&originWidth=640&size=322113&status=done&style=none&width=640" alt="image.png"><br>LSM-tree的这种结构非常有利于数据的快速写入（理论上可以接近磁盘顺序写速度），但是不利于读——因为理论上读的时候可能需要同时从 memtable和所有硬盘上的 sstable中查询数据，这样显然会对性能造成较大的影响。为了解决这个问题， LSM-tree采取了以下主要的相关措施。</p><ul><li>定期将硬盘上小的 sstable合并（通常叫作 Merge或 Compaction操作）成大的 sstable，以减少 sstable的数量。而且，平时的数据更新删除操作并不会更新原有的数据文件，只会将更新删除操作加到当前的数据文件末端，只有在 sstable合并的时候才会真正将重复的操作或更新去重、合并。<br></li><li><strong>对每个 sstable使用布隆过滤器（ Bloom Filter），以加速对数据在该 sstable的存在性进行判定，从而减少数据的总查询时间。</strong><br></li></ul><p><a name="ti7WV"></a></p><h3 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h3><p>LSM树和B+树的差异主要在于读性能和写性能进行权衡，在牺牲的同时寻找其余补救方案。B+树存储引擎，不仅支持单条记录的增、删、读、改操作，还支持顺序扫描（B+树的叶子节点之间的指针），对应的存储系统就是关系数据库。但随着写入操作增多，为了维护B+树结构，节点分裂，读磁盘的随机读写概率会变大，性能会逐渐减弱。LSM树（Log-Structured MergeTree）存储引擎和B+树存储引擎一样，同样支持增、删、读、改、顺序扫描操作。而且通过批量存储技术规避磁盘随机写入问题。当然凡事有利有弊，LSM树和B+树相比，LSM树牺牲了部分读性能，用来大幅提高写性能。</p><!-- rebuild by neat -->]]></content>
    
    <summary type="html">
    
      
      
        &lt;!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --&gt;&lt;p&gt;今天来了解下关系型数据库MySQL和NoSQL存储引擎HBase的底层存储机制。对于一个数据库的性能来说，其数据的组织方式至关重要。众所周知，数
      
    
    </summary>
    
    
      <category term="大数据" scheme="cpeixin.cn/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/"/>
    
    
      <category term="HBase" scheme="cpeixin.cn/tags/HBase/"/>
    
  </entry>
  
  <entry>
    <title>Kafka 压缩算法</title>
    <link href="cpeixin.cn/2020/09/13/Kafka-%E5%8E%8B%E7%BC%A9%E7%AE%97%E6%B3%95/"/>
    <id>cpeixin.cn/2020/09/13/Kafka-%E5%8E%8B%E7%BC%A9%E7%AE%97%E6%B3%95/</id>
    <published>2020-09-13T13:58:54.000Z</published>
    <updated>2020-09-13T14:00:26.778Z</updated>
    
    <content type="html"><![CDATA[<!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --><p>压缩（compression），我相信你一定不会感到陌生。它秉承了用时间去换空间的经典 trade-off 思想，具体来说就是用 CPU 时间去换磁盘空间或网络 I/O 传输量，希望以较小的 CPU 开销带来更少的磁盘占用或更少的网络 I/O 传输。在 Kafka 中，压缩也是用来做这件事的。<br></p><p><a name="iie7v"></a></p><h3 id="怎么压缩？"><a href="#怎么压缩？" class="headerlink" title="怎么压缩？"></a>怎么压缩？</h3><p>Kafka 是如何压缩消息的呢？要弄清楚这个问题，就要从 Kafka 的消息格式说起了。目前 Kafka 共有两大类消息格式，社区分别称之为 V1 版本和 V2 版本。V2 版本是 Kafka 0.11.0.0 中正式引入的。不论是哪个版本，Kafka 的消息层次都分为两层：消息集合（message set）以及消息（message）。一个消息集合中包含若干条日志项（record item），而日志项才是真正封装消息的地方。<br><br><br>Kafka 底层的消息日志由一系列消息集合日志项组成。Kafka 通常不会直接操作具体的一条条消息，它总是在消息集合这个层面上进行写入操作。那么社区引入 V2 版本的目的是什么呢？V2 版本主要是针对 V1 版本的一些弊端做了修正，和我们今天讨论的主题相关的修正有哪些呢？<br><br><br>先介绍一个，就是把消息的公共部分抽取出来放到外层消息集合里面，这样就不用每条消息都保存这些信息了。我来举个例子。原来在 V1 版本中，每条消息都需要执行 CRC 校验，但有些情况下消息的 CRC 值是会发生变化的。比如在 Broker 端可能会对消息时间戳字段进行更新，那么重新计算之后的 CRC 值也会相应更新；再比如 Broker 端在执行消息格式转换时（主要是为了兼容老版本客户端程序），也会带来 CRC 值的变化。鉴于这些情况，再对每条消息都执行 CRC 校验就有点没必要了，不仅浪费空间还耽误 CPU 时间，因此<strong>在 V2 版本中，消息的 CRC 校验工作就被移到了消息集合这一层</strong>。<br><br><br><strong>校验的目的是防止因为网络传输出现问题导致broker端接收了受损的消息，所以应该放在作为server broker端进行，而不是在作为client端的producer。</strong><br><br><br>V2 版本还有一个和压缩息息相关的改进，就是保存压缩消息的方法发生了变化。之前 V1 版本中保存压缩消息的方法是把多条消息进行压缩然后保存到外层消息的消息体字段中；而 V2 版本的做法是对整个消息集合进行压缩。显然后者应该比前者有更好的压缩效果。我对两个版本分别做了一个简单的测试，结果显示，在相同条件下，不论是否启用压缩，V2 版本都比 V1 版本节省磁盘空间。当启用压缩时，这种节省空间的效果更加明显，就像下面这两张图展示的那样：<br><br><br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1600001833492-12e5a70b-c4f9-4962-81ed-244100290ce2.png#align=left&display=inline&height=273&margin=%5Bobject%20Object%5D&name=image.png&originHeight=404&originWidth=1016&size=25962&status=done&style=none&width=687" alt="image.png"><br></p><p><a name="oNKEh"></a></p><h3 id="何时压缩？"><a href="#何时压缩？" class="headerlink" title="何时压缩？"></a>何时压缩？</h3><p><br>在 Kafka 中，压缩可能发生在两个地方：生产者端和 Broker 端。生产者程序中配置 compression.type 参数即表示启用指定类型的压缩算法。比如下面这段程序代码展示了如何构建一个开启 GZIP 的 Producer 对象：<br></p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"></span><br><span class="line">Properties props &#x3D; new Properties();</span><br><span class="line">props.put(&quot;bootstrap.servers&quot;, &quot;localhost:9092&quot;);</span><br><span class="line">props.put(&quot;acks&quot;, &quot;all&quot;);</span><br><span class="line">props.put(&quot;key.serializer&quot;, &quot;org.apache.kafka.common.serialization.StringSerializer&quot;);</span><br><span class="line">props.put(&quot;value.serializer&quot;, &quot;org.apache.kafka.common.serialization.StringSerializer&quot;);</span><br><span class="line">&#x2F;&#x2F; 开启GZIP压缩</span><br><span class="line">props.put(&quot;compression.type&quot;, &quot;gzip&quot;);</span><br><span class="line"></span><br><span class="line">Producer&lt;String, String&gt; producer &#x3D; new KafkaProducer&lt;&gt;(props);</span><br></pre></td></tr></table></figure><p><br>这里比较关键的代码行是 props.put(“compression.type”, “gzip”)，它表明该 Producer 的压缩算法使用的是 GZIP。这样 Producer 启动后生产的每个消息集合都是经 GZIP 压缩过的，故而能很好地节省网络传输带宽以及 Kafka Broker 端的磁盘占用。<br><br><br>在生产者端启用压缩是很自然的想法，那为什么我说在 Broker 端也可能进行压缩呢？其实大部分情况下 Broker 从 Producer 端接收到消息后仅仅是原封不动地保存而不会对其进行任何修改，但这里的“大部分情况”也是要满足一定条件的。有两种例外情况就可能让 Broker 重新压缩消息。<br><br><br><strong>情况一</strong>：Broker 端指定了和 Producer 端不同的压缩算法。<br><br><br>先看一个例子。想象这样一个对话。<br><br><br>Producer 说：“我要使用 GZIP 进行压缩。”<br>Broker 说：“不好意思，我这边接收的消息必须使用 Snappy 算法进行压缩。”<br><br><br>你看，这种情况下 Broker 接收到 GZIP 压缩消息后，只能解压缩然后使用 Snappy 重新压缩一遍。如果你翻开 Kafka 官网，你会发现 Broker 端也有一个参数叫 compression.type，和上面那个例子中的同名。但是这个参数的默认值是 producer，这表示 Broker 端会“尊重”Producer 端使用的压缩算法。可一旦你在 Broker 端设置了不同的 compression.type 值，就一定要小心了，因为可能会发生预料之外的压缩 / 解压缩操作，通常表现为 Broker 端 CPU 使用率飙升。<br><br><br><strong>情况二</strong>：Broker 端发生了消息格式转换。所谓的消息格式转换主要是为了兼容老版本的消费者程序。还记得之前说过的 V1、V2 版本吧？在一个生产环境中，Kafka 集群中同时保存多种版本的消息格式非常常见。为了兼容老版本的格式，Broker 端会对新版本消息执行向老版本格式的转换。这个过程中会涉及消息的解压缩和重新压缩。一般情况下这种消息格式转换对性能是有很大影响的，除了这里的压缩之外，它还让 Kafka 丧失了引以为豪的 Zero Copy 特性。<br><br><br>所谓“Zero Copy”就是“零拷贝”，说的是当数据在磁盘和网络进行传输时避免昂贵的内核态数据拷贝，从而实现快速的数据传输。因此如果 Kafka 享受不到这个特性的话，性能必然有所损失，所以尽量保证消息格式的统一吧，这样不仅可以避免不必要的解压缩 / 重新压缩，对提升其他方面的性能也大有裨益。如果有兴趣你可以深入地了解下 Zero Copy 的原理。<br></p><p><a name="UQsjo"></a></p><h3 id="何时解压缩？"><a href="#何时解压缩？" class="headerlink" title="何时解压缩？"></a>何时解压缩？</h3><p>有压缩必有解压缩！通常来说解压缩发生在消费者程序中，也就是说 Producer 发送压缩消息到 Broker 后，Broker 照单全收并原样保存起来。当 Consumer 程序请求这部分消息时，Broker 依然原样发送出去，当消息到达 Consumer 端后，由 Consumer 自行解压缩还原成之前的消息。那么现在问题来了，Consumer 怎么知道这些消息是用何种压缩算法压缩的呢？<br><br><br>其实答案就在消息中。Kafka 会将启用了哪种压缩算法封装进消息集合中，这样当 Consumer 读取到消息集合时，它自然就知道了这些消息使用的是哪种压缩算法。如果用一句话总结一下压缩和解压缩，那么我希望你记住这句话：<strong>Producer 端压缩、Broker 端保持、Consumer 端解压缩</strong>。<br><br><br>除了在 Consumer 端解压缩，Broker 端也会进行解压缩。注意了，这和前面提到消息格式转换时发生的解压缩是不同的场景。每个压缩过的消息集合在** Broker 端写入时都要发生解压缩操作，目的就是为了对消息执行各种验证。**我们必须承认这种解压缩对 Broker 端性能是有一定影响的，特别是对 CPU 的使用率而言。<br><br><br>事实上，最近国内京东的小伙伴们刚刚向社区提出了一个 bugfix，建议去掉因为做消息校验而引入的解压缩。据他们称，去掉了解压缩之后，Broker 端的 CPU 使用率至少降低了 50%。不过有些遗憾的是，目前社区并未采纳这个建议，原因就是这种消息校验是非常重要的，不可盲目去之。毕竟先把事情做对是最重要的，在做对的基础上，再考虑把事情做好做快。针对这个使用场景，你也可以思考一下，是否有一个两全其美的方案，既能避免消息解压缩也能对消息执行校验。<br></p><p><a name="tUfsS"></a></p><h3 id="各种压缩算法对比"><a href="#各种压缩算法对比" class="headerlink" title="各种压缩算法对比"></a>各种压缩算法对比</h3><p>那么我们来谈谈压缩算法。这可是重头戏！之前说了这么多，我们还是要比较一下各个压缩算法的优劣，这样我们才能有针对性地配置适合我们业务的压缩策略。在 Kafka 2.1.0 版本之前，Kafka 支持 3 种压缩算法：GZIP、Snappy 和 LZ4。<br><br><br>从 2.1.0 开始，Kafka 正式支持 Zstandard 算法（简写为 zstd）。它是 Facebook 开源的一个压缩算法，能够提供超高的压缩比（compression ratio）。对了，看一个压缩算法的优劣，有两个重要的指标：一个指标是压缩比，原先占 100 份空间的东西经压缩之后变成了占 20 份空间，那么压缩比就是 5，显然压缩比越高越好；另一个指标就是压缩 / 解压缩吞吐量，比如每秒能压缩或解压缩多少 MB 的数据。同样地，吞吐量也是越高越好。下面这张表是 Facebook Zstandard 官网提供的一份压缩算法 benchmark 比较结果：<br><br><br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1600002495728-f2d5375e-82c8-4e00-bcd1-92a9bb01bcf2.png#align=left&display=inline&height=346&margin=%5Bobject%20Object%5D&name=image.png&originHeight=346&originWidth=411&size=33818&status=done&style=none&width=411" alt="image.png"><br>从表中我们可以发现 zstd 算法有着最高的压缩比，而在吞吐量上的表现只能说中规中矩。反观 LZ4 算法，它在吞吐量方面则是毫无疑问的执牛耳者。当然对于表格中数据的权威性我不做过多解读，只想用它来说明一下当前各种压缩算法的大致表现。<br><br><br>在实际使用中，GZIP、Snappy、LZ4 甚至是 zstd 的表现各有千秋。但对于 Kafka 而言，它们的性能测试结果却出奇得一致，即在吞吐量方面：LZ4 &gt; Snappy &gt; zstd 和 GZIP；而在压缩比方面，zstd &gt; LZ4 &gt; GZIP &gt; Snappy。具体到物理资源，使用 Snappy 算法占用的网络带宽最多，zstd 最少，这是合理的，毕竟 zstd 就是要提供超高的压缩比；在 CPU 使用率方面，各个算法表现得差不多，只是在压缩时 Snappy 算法使用的 CPU 较多一些，而在解压缩时 GZIP 算法则可能使用更多的 CPU。<br></p><p><a name="799NN"></a></p><h3 id="最佳实践"><a href="#最佳实践" class="headerlink" title="最佳实践"></a>最佳实践</h3><p>了解了这些算法对比，我们就能根据自身的实际情况有针对性地启用合适的压缩算法。首先来说压缩。何时启用压缩是比较合适的时机呢？你现在已经知道 Producer 端完成的压缩，那么启用压缩的一个条件就是 Producer 程序运行机器上的 CPU 资源要很充足。如果 Producer 运行机器本身 CPU 已经消耗殆尽了，那么启用消息压缩无疑是雪上加霜，只会适得其反。<br><br><br>除了 CPU 资源充足这一条件，如果你的环境中带宽资源有限，那么我也建议你开启压缩。事实上我见过的很多 Kafka 生产环境都遭遇过带宽被打满的情况。这年头，带宽可是比 CPU 和内存还要珍贵的稀缺资源，毕竟万兆网络还不是普通公司的标配，因此千兆网络中 Kafka 集群带宽资源耗尽这件事情就特别容易出现。如果你的客户端机器 CPU 资源有很多富余，我强烈建议你开启 zstd 压缩，这样能极大地节省网络资源消耗。<br><br><br>其次说说解压缩。其实也没什么可说的。一旦启用压缩，解压缩是不可避免的事情。这里只想强调一点：我们对不可抗拒的解压缩无能为力，但至少能规避掉那些意料之外的解压缩。就像我前面说的，因为要兼容老版本而引入的解压缩操作就属于这类。有条件的话尽量保证不要出现消息格式转换的情况。</p><!-- rebuild by neat -->]]></content>
    
    <summary type="html">
    
      
      
        &lt;!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --&gt;&lt;p&gt;压缩（compression），我相信你一定不会感到陌生。它秉承了用时间去换空间的经典 trade-off 思想，具体来说就是用 CPU 时间去
      
    
    </summary>
    
    
      <category term="大数据" scheme="cpeixin.cn/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/"/>
    
    
      <category term="kafka" scheme="cpeixin.cn/tags/kafka/"/>
    
  </entry>
  
  <entry>
    <title>序列化在Spark中节约空间了么？</title>
    <link href="cpeixin.cn/2020/09/03/%E5%BA%8F%E5%88%97%E5%8C%96%E5%9C%A8Spark%E4%B8%AD%E8%8A%82%E7%BA%A6%E7%A9%BA%E9%97%B4%E4%BA%86%E4%B9%88%EF%BC%9F/"/>
    <id>cpeixin.cn/2020/09/03/%E5%BA%8F%E5%88%97%E5%8C%96%E5%9C%A8Spark%E4%B8%AD%E8%8A%82%E7%BA%A6%E7%A9%BA%E9%97%B4%E4%BA%86%E4%B9%88%EF%BC%9F/</id>
    <published>2020-09-03T15:58:14.000Z</published>
    <updated>2020-09-13T16:00:09.807Z</updated>
    
    <content type="html"><![CDATA[<!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --><p><a name="5hXkC"></a></p><h3 id="序列化在Spark中的用处"><a href="#序列化在Spark中的用处" class="headerlink" title="序列化在Spark中的用处"></a>序列化在Spark中的用处</h3><ul><li><p>在算子函数中使用到外部变量时，该变量会被序列化后进行网络传输</p></li><li><p>将自定义的类型作为RDD的泛型类型时（比如JavaRDD，Student是自定义类型），所有自定义类型对象，都会进行序列化。因此这种情况下，也要求自定义的类必须实现Serializable接口。<br></p></li><li><p>使用可序列化的持久化策略时（比如MEMORY_ONLY_SER），Spark会将RDD中的每个partition都序列化成一个大的字节数组。<br><br><a name="8XhQ2"></a></p><h3 id="Java序列化与Kryo序列化对比"><a href="#Java序列化与Kryo序列化对比" class="headerlink" title="Java序列化与Kryo序列化对比"></a>Java序列化与Kryo序列化对比</h3></li><li><p>Java序列化：采用objectOutputStream对象进行序列化，任何的类实现一个 java.io.Serializable接口都可以进行序列化，尽管Java序列化操作十分灵活，但是却十分缓慢，会产生更大的序列化类</p></li><li><p>Kryo序列化：kryo序列化更快且比Java占用空间更小。但是并不支持所有序列化类，需要事先在应用程序中注册类<br><br><a name="a5JvZ"></a></p><h3 id="kryo启动方式"><a href="#kryo启动方式" class="headerlink" title="kryo启动方式"></a>kryo启动方式</h3><p>在conf中配置：</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">&quot;spark.serializer&#x3D;&quot;org.apache.spark.serializer.KryoSerializer&quot;</span><br></pre></td></tr></table></figure></li></ul><p><a name="XH3iN"></a></p><h3 id="调优"><a href="#调优" class="headerlink" title="调优"></a>调优</h3><ul><li>spark.kryoserializer.buffer 这个是core中kryo缓存的大小，每个core一个，默认64K<br></li><li>spark.kryoserializer.buffer.max 这个是缓存的最大大小，默认为64M，序列化的类最大不可超过2G<br></li></ul><p><a name="L5SRq"></a></p><h3 id="测试"><a href="#测试" class="headerlink" title="测试"></a>测试</h3><p>普通不序列化的MEMORY_ONLY</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">val persons&#x3D;new ArrayBuffer[Person]</span><br><span class="line">for(i&lt;-1 to 1000000)</span><br><span class="line">&#123;</span><br><span class="line">persons+&#x3D;(Person(&quot;name&quot;+i,10+i,&quot;male&quot;,&quot;haerbin&quot;))</span><br><span class="line">&#125;</span><br><span class="line">val personrdd&#x3D;sc.parallelize(persons)</span><br><span class="line">personrdd.persist(StorageLevel.MEMORY_ONLY)</span><br><span class="line">personrdd.count()</span><br></pre></td></tr></table></figure><p><strong>结果：95.3M</strong><br><br><br>使用Java序列化的MEMORY_ONLY_SER</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">val persons&#x3D;new ArrayBuffer[Person]</span><br><span class="line">for(i&lt;-1 to 1000000)</span><br><span class="line">&#123;</span><br><span class="line">persons+&#x3D;(Person(&quot;name&quot;+i,10+i,&quot;male&quot;,&quot;haerbin&quot;))</span><br><span class="line">&#125;</span><br><span class="line">val personrdd&#x3D;sc.parallelize(persons)</span><br><span class="line">personrdd.persist(StorageLevel.MEMORY_ONLY_SER)</span><br><span class="line">personrdd.count()</span><br></pre></td></tr></table></figure><p><strong>结果：39.8M</strong><br><br><br>kryo的未注册类的MEMORY_ONLY_SER</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">val persons&#x3D;new ArrayBuffer[Person]</span><br><span class="line">for(i&lt;-1 to 1000000)</span><br><span class="line">&#123;</span><br><span class="line">persons+&#x3D;(Person(&quot;name&quot;+i,10+i,&quot;male&quot;,&quot;haerbin&quot;))</span><br><span class="line">&#125;</span><br><span class="line">val personrdd&#x3D;sc.parallelize(persons)</span><br><span class="line">personrdd.persist(StorageLevel.MEMORY_ONLY_SER)</span><br><span class="line">personrdd.count()</span><br></pre></td></tr></table></figure><p><strong>结果：119.1M</strong>，之所以会发生这种情况，是因为没有注册类，所以kryo就将所有的类进行操作，所有导致占用很大内存<br><br><br>使用kryo注册类，然后打包到集群运行（这种方式命令行看不到效果）</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line">sc.getConf.registerKryoClasses(Array(classOf[Person]))</span><br><span class="line">val persons&#x3D;new ArrayBuffer[Person]</span><br><span class="line">for(i&lt;-1 to 1000000)</span><br><span class="line">&#123;</span><br><span class="line">persons+&#x3D;(Person(&quot;name&quot;+i,10+i,&quot;male&quot;,&quot;haerbin&quot;))</span><br><span class="line">&#125;</span><br><span class="line">val personrdd&#x3D;sc.parallelize(persons)</span><br><span class="line">personrdd.persist(StorageLevel.MEMORY_ONLY_SER)</span><br><span class="line">使用submit提交jar包</span><br><span class="line">--class sparkcore.SerializerApp \</span><br><span class="line">--name SerializerApp \</span><br><span class="line">--master yarn \</span><br><span class="line"> &#x2F;home&#x2F;hadoop&#x2F;lib2&#x2F;scala6-1.0.jar</span><br></pre></td></tr></table></figure><p><strong>结果 ：27.5M</strong>，结果显而易见，kryo效果更好<br></p><p><a name="cn6fo"></a></p><h3 id="序列化时间"><a href="#序列化时间" class="headerlink" title="序列化时间"></a>序列化时间</h3><p>实体类</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br></pre></td><td class="code"><pre><span class="line">import java.io.Serializable;</span><br><span class="line">import java.util.Map;</span><br><span class="line"></span><br><span class="line">public class Simple  implements Serializable</span><br><span class="line">&#123;  </span><br><span class="line">     private static final long serialVersionUID &#x3D; -4914434736682797743L;  </span><br><span class="line">     private String name;  </span><br><span class="line">     private int age;  </span><br><span class="line">     private Map&lt;String,Integer&gt; map;  </span><br><span class="line">     public Simple()&#123;  </span><br><span class="line">  </span><br><span class="line">     &#125;  </span><br><span class="line">     public Simple(String name,int age,Map&lt;String,Integer&gt; map)&#123;  </span><br><span class="line">         this.name &#x3D; name;  </span><br><span class="line">         this.age &#x3D; age;  </span><br><span class="line">         this.map &#x3D; map;  </span><br><span class="line">     &#125;  </span><br><span class="line">  </span><br><span class="line">     public String getName() &#123;  </span><br><span class="line">       return name;  </span><br><span class="line">     &#125;  </span><br><span class="line">  </span><br><span class="line">     public void setName(String name) &#123;  </span><br><span class="line">        this.name &#x3D; name;  </span><br><span class="line">     &#125;  </span><br><span class="line">  </span><br><span class="line">     public int getAge() &#123;  </span><br><span class="line">        return age;  </span><br><span class="line">     &#125;  </span><br><span class="line">  </span><br><span class="line">     public void setAge(int age) &#123;  </span><br><span class="line">        this.age &#x3D; age;  </span><br><span class="line">     &#125;  </span><br><span class="line">  </span><br><span class="line">     public Map&lt;String, Integer&gt; getMap() &#123;  </span><br><span class="line">        return map;  </span><br><span class="line">     &#125;  </span><br><span class="line">  </span><br><span class="line">     public void setMap(Map&lt;String, Integer&gt; map) &#123;  </span><br><span class="line">        this.map &#x3D; map;  </span><br><span class="line">     &#125;  </span><br><span class="line">  </span><br><span class="line">  </span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>java原生序列化 OriginalSerializable.java</strong></p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br></pre></td><td class="code"><pre><span class="line">import java.io.FileInputStream;</span><br><span class="line">import java.io.FileNotFoundException;</span><br><span class="line">import java.io.FileOutputStream;</span><br><span class="line">import java.io.IOException;</span><br><span class="line">import java.io.ObjectInputStream;</span><br><span class="line">import java.io.ObjectOutputStream;</span><br><span class="line">import java.util.HashMap;</span><br><span class="line">import java.util.Map;</span><br><span class="line"></span><br><span class="line">import bhz.entity.Simple;</span><br><span class="line"></span><br><span class="line">public class OriginalSerializable &#123;  </span><br><span class="line">      </span><br><span class="line">    public static void main(String[] args) throws IOException, ClassNotFoundException &#123;  </span><br><span class="line">        long start &#x3D;  System.currentTimeMillis();  </span><br><span class="line">        setSerializableObject();  </span><br><span class="line">        System.out.println(&quot;java原生序列化时间:&quot; + (System.currentTimeMillis() - start) + &quot; ms&quot; );    </span><br><span class="line">        start &#x3D;  System.currentTimeMillis();  </span><br><span class="line">        getSerializableObject();  </span><br><span class="line">        System.out.println(&quot;java原生反序列化时间:&quot; + (System.currentTimeMillis() - start) + &quot; ms&quot;);  </span><br><span class="line">    &#125;  </span><br><span class="line">  </span><br><span class="line">    public static void setSerializableObject() throws IOException&#123;  </span><br><span class="line">  </span><br><span class="line">        FileOutputStream fo &#x3D; new FileOutputStream(&quot;D:&#x2F;file2.bin&quot;);  </span><br><span class="line">  </span><br><span class="line">        ObjectOutputStream so &#x3D; new ObjectOutputStream(fo);  </span><br><span class="line">  </span><br><span class="line">        for (int i &#x3D; 0; i &lt; 100000; i++) &#123;  </span><br><span class="line">            Map&lt;String,Integer&gt; map &#x3D; new HashMap&lt;String, Integer&gt;(2);  </span><br><span class="line">            map.put(&quot;zhang0&quot;, i);  </span><br><span class="line">            map.put(&quot;zhang1&quot;, i);  </span><br><span class="line">            so.writeObject(new Simple(&quot;zhang&quot;+i,(i+1),map));  </span><br><span class="line">        &#125;  </span><br><span class="line">        so.flush();  </span><br><span class="line">        so.close();  </span><br><span class="line">    &#125;  </span><br><span class="line">  </span><br><span class="line">    public static void getSerializableObject()&#123;  </span><br><span class="line">         FileInputStream fi;  </span><br><span class="line">        try &#123;  </span><br><span class="line">            fi &#x3D; new FileInputStream(&quot;D:&#x2F;file2.bin&quot;);  </span><br><span class="line">            ObjectInputStream si &#x3D; new ObjectInputStream(fi);  </span><br><span class="line">  </span><br><span class="line">            Simple simple &#x3D;null;  </span><br><span class="line">            while((simple&#x3D;(Simple)si.readObject()) !&#x3D; null)&#123;  </span><br><span class="line">                &#x2F;&#x2F;System.out.println(simple.getAge() + &quot;  &quot; + simple.getName());  </span><br><span class="line">            &#125;  </span><br><span class="line">            fi.close();  </span><br><span class="line">            si.close();  </span><br><span class="line">        &#125; catch (FileNotFoundException e) &#123;  </span><br><span class="line">            e.printStackTrace();  </span><br><span class="line">        &#125; catch (IOException e) &#123;  </span><br><span class="line">            &#x2F;&#x2F;e.printStackTrace();  </span><br><span class="line">        &#125; catch (ClassNotFoundException e) &#123;  </span><br><span class="line">            e.printStackTrace();  </span><br><span class="line">        &#125;  </span><br><span class="line"></span><br><span class="line">    &#125;  </span><br><span class="line">  </span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>kyro序列化 KyroSerializable.java</strong></p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br></pre></td><td class="code"><pre><span class="line">import java.io.FileInputStream;</span><br><span class="line">import java.io.FileNotFoundException;</span><br><span class="line">import java.io.FileOutputStream;</span><br><span class="line">import java.io.IOException;</span><br><span class="line">import java.util.HashMap;</span><br><span class="line">import java.util.Map;</span><br><span class="line"></span><br><span class="line">import org.objenesis.strategy.StdInstantiatorStrategy;</span><br><span class="line"></span><br><span class="line">import bhz.entity.Simple;</span><br><span class="line"></span><br><span class="line">import com.esotericsoftware.kryo.Kryo;</span><br><span class="line">import com.esotericsoftware.kryo.KryoException;</span><br><span class="line">import com.esotericsoftware.kryo.io.Input;</span><br><span class="line">import com.esotericsoftware.kryo.io.Output;</span><br><span class="line"></span><br><span class="line">public class KyroSerializable &#123;  </span><br><span class="line">      </span><br><span class="line">    public static void main(String[] args) throws IOException &#123;  </span><br><span class="line">        long start &#x3D;  System.currentTimeMillis();  </span><br><span class="line">        setSerializableObject();  </span><br><span class="line">        System.out.println(&quot;Kryo 序列化时间:&quot; + (System.currentTimeMillis() - start) + &quot; ms&quot; );  </span><br><span class="line">        start &#x3D;  System.currentTimeMillis();  </span><br><span class="line">        getSerializableObject();  </span><br><span class="line">        System.out.println(&quot;Kryo 反序列化时间:&quot; + (System.currentTimeMillis() - start) + &quot; ms&quot;);  </span><br><span class="line">  </span><br><span class="line">    &#125;  </span><br><span class="line">  </span><br><span class="line">    public static void setSerializableObject() throws FileNotFoundException&#123;  </span><br><span class="line">  </span><br><span class="line">        Kryo kryo &#x3D; new Kryo();  </span><br><span class="line">        kryo.setReferences(false);  </span><br><span class="line">        kryo.setRegistrationRequired(false);  </span><br><span class="line">        kryo.setInstantiatorStrategy(new StdInstantiatorStrategy());  </span><br><span class="line">        kryo.register(Simple.class);  </span><br><span class="line">        Output output &#x3D; new Output(new FileOutputStream(&quot;D:&#x2F;file1.bin&quot;));  </span><br><span class="line">        for (int i &#x3D; 0; i &lt; 100000; i++) &#123;  </span><br><span class="line">            Map&lt;String,Integer&gt; map &#x3D; new HashMap&lt;String, Integer&gt;(2);  </span><br><span class="line">            map.put(&quot;zhang0&quot;, i);  </span><br><span class="line">            map.put(&quot;zhang1&quot;, i);  </span><br><span class="line">            kryo.writeObject(output, new Simple(&quot;zhang&quot;+i,(i+1),map));  </span><br><span class="line">        &#125;  </span><br><span class="line">        output.flush();  </span><br><span class="line">        output.close();  </span><br><span class="line">    &#125;  </span><br><span class="line">  </span><br><span class="line">  </span><br><span class="line">    public static void getSerializableObject()&#123;  </span><br><span class="line">        Kryo kryo &#x3D; new Kryo();  </span><br><span class="line">        kryo.setReferences(false);  </span><br><span class="line">        kryo.setRegistrationRequired(false);  </span><br><span class="line">        kryo.setInstantiatorStrategy(new StdInstantiatorStrategy());  </span><br><span class="line">        Input input;  </span><br><span class="line">        try &#123;  </span><br><span class="line">            input &#x3D; new Input(new FileInputStream(&quot;D:&#x2F;file1.bin&quot;));  </span><br><span class="line">            Simple simple &#x3D;null;  </span><br><span class="line">            while((simple&#x3D;kryo.readObject(input, Simple.class)) !&#x3D; null)&#123;  </span><br><span class="line">                &#x2F;&#x2F;System.out.println(simple.getAge() + &quot;  &quot; + simple.getName() + &quot;  &quot; + simple.getMap().toString());  </span><br><span class="line">            &#125;  </span><br><span class="line">  </span><br><span class="line">            input.close();  </span><br><span class="line">        &#125; catch (FileNotFoundException e) &#123;  </span><br><span class="line">            e.printStackTrace();  </span><br><span class="line">        &#125; catch(KryoException e)&#123;  </span><br><span class="line">  </span><br><span class="line">        &#125;  </span><br><span class="line">    &#125;  </span><br><span class="line">  </span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>**</p><blockquote><p>java原生序列化时间:8281 ms<br>java原生反序列化时间:5899 ms</p></blockquote><blockquote><p>Kryo 序列化时间:630 ms<br>Kryo 反序列化时间:15 ms</p></blockquote><p>经过对比，可以发现kryo是java原生序列化性能十几倍<br><br><br>官方也推荐尽量使用Kryo的序列化库（版本2）。官文介绍，Kryo序列化机制比Java序列化机制性能提高10倍左右，Spark之所以没有默认使用Kryo作为序列化类库，是因为它不支持所有对象的序列化，同时Kryo需要用户在使用前注册需要序列化的类型，不够方便。</p><!-- rebuild by neat -->]]></content>
    
    <summary type="html">
    
      
      
        &lt;!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --&gt;&lt;p&gt;&lt;a name=&quot;5hXkC&quot;&gt;&lt;/a&gt;&lt;/p&gt;&lt;h3 id=&quot;序列化在Spark中的用处&quot;&gt;&lt;a href=&quot;#序列化在Spark中的用处&quot; 
      
    
    </summary>
    
    
      <category term="大数据" scheme="cpeixin.cn/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/"/>
    
    
      <category term="spark" scheme="cpeixin.cn/tags/spark/"/>
    
  </entry>
  
  <entry>
    <title>Redis底层数据结构</title>
    <link href="cpeixin.cn/2020/08/30/Redis%E5%BA%95%E5%B1%82%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84/"/>
    <id>cpeixin.cn/2020/08/30/Redis%E5%BA%95%E5%B1%82%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84/</id>
    <published>2020-08-30T06:47:56.000Z</published>
    <updated>2020-09-07T03:30:07.709Z</updated>
    
    <content type="html"><![CDATA[<!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --><p><strong>经典数据库 Redis 中的常用数据类型，底层都是用哪种数据结构实现的？</strong><br></p><p><a name="hCPZb"></a></p><h3 id="Redis-数据库介绍"><a href="#Redis-数据库介绍" class="headerlink" title="Redis 数据库介绍"></a>Redis 数据库介绍</h3><p>Redis 是一种键值（Key-Value）数据库。相对于关系型数据库（比如 MySQL），Redis 也被叫作非关系型数据库。像 MySQL 这样的关系型数据库，表的结构比较复杂，会包含很多字段，可以通过 SQL 语句，来实现非常复杂的查询需求。而 Redis 中只包含“键”和“值”两部分，只能通过“键”来查询“值”。<br><br><br>正是因为这样简单的存储结构，也让 Redis 的读写效率非常高。除此之外，Redis 主要是作为内存数据库来使用，也就是说，数据是存储在内存中的。尽管它经常被用作内存数据库，但是，它也支持将数据存储在硬盘中。<br><br><br>这一点，我们后面会介绍。Redis 中，键的数据类型是字符串，但是为了丰富数据存储的方式，方便开发者使用，值的数据类型有很多，常用的数据类型有这样几种，它们分别是<strong>字符串、列表、字典、集合、有序集合</strong>。<br><br><br>“字符串（string）”这种数据类型非常简单，对应到数据结构里，就是字符串。你应该非常熟悉，这里我就不多介绍了。我们着重看下，其他四种比较复杂点的数据类型，看看它们底层都依赖了哪些数据结构。<br></p><p><a name="a5oTv"></a></p><h3 id="列表（list）"><a href="#列表（list）" class="headerlink" title="列表（list）"></a>列表（list）</h3><p>我们先来看列表。列表这种数据类型支持存储一组数据。这种数据类型对应两种实现方法，<strong>一种是压缩列表（ziplist），另一种是双向循环链表</strong>。<br><br><br>当列表中存储的数据量比较小的时候，列表就可以采用压缩列表的方式实现。具体需要同时满足下面两个条件：<strong>列表中保存的单个数据（有可能是字符串类型的）小于 64 字节；列表中数据个数少于 512 个</strong>。<br><br><br>关于压缩列表，我这里稍微解释一下。它并不是基础数据结构，而是 Redis 自己设计的一种数据存储结构。它有点儿类似数组，通过一片连续的内存空间，来存储数据，压缩列表不支持随机访问，Redis一般都是通过key获取整个value的值，也就是整个压缩列表的数据，并不需要随机访问。不过，它跟数组不同的一点是，它允许存储的数据大小不同。具体的存储结构也非常简单，你可以看我下面画的这幅图。<br><br><br><img src="https://cdn.nlark.com/yuque/0/2020/jpeg/1072113/1598767682916-bb139261-bf60-4680-8404-8c9e623b9474.jpeg#align=left&display=inline&height=381&margin=%5Bobject%20Object%5D&name=49fd8d46eb94f463ace98717f11c2cb5.jpg&originHeight=381&originWidth=1142&size=63756&status=done&style=none&width=1142" alt="49fd8d46eb94f463ace98717f11c2cb5.jpg"><br>现在，我们来看看，压缩列表中的“压缩”两个字该如何理解？听到“压缩”两个字，直观的反应就是节省内存。之所以说这种存储结构节省内存，是相较于数组的存储思路而言的。我们知道，<strong>数组要求每个元素的大小相同</strong>，如果我们要存储不同长度的字符串，那我们就需要用最大长度的字符串大小作为元素的大小（假设是 20 个字节）。那当我们存储小于 20 个字节长度的字符串的时候，便会浪费部分存储空间。听起来有点儿拗口，我画个图解释一下。<br><img src="https://cdn.nlark.com/yuque/0/2020/jpeg/1072113/1598767778513-a72b0e53-fde9-44a1-9af1-173dfbf1cca6.jpeg#align=left&display=inline&height=415&margin=%5Bobject%20Object%5D&name=2e2f2e5a2fe25d26dc2fc04cfe88f869.jpg&originHeight=415&originWidth=1142&size=61374&status=done&style=none&width=1142" alt="2e2f2e5a2fe25d26dc2fc04cfe88f869.jpg"><br>压缩列表这种存储结构，一方面比较节省内存，另一方面可以支持不同类型数据的存储。<br><br><br>而且，因为数据存储在一片连续的内存空间，通过键来获取值为列表类型的数据，读取的效率也非常高。当列表中存储的数据量比较大的时候，也就是不能同时满足刚刚讲的两个条件的时候，列表就要通过双向循环链表来实现了。<br><br><br>在链表里，我们已经讲过双向循环链表这种数据结构了。这里我们着重看一下 Redis 中双向链表的编码实现方式。Redis 的这种双向链表的实现方式，非常值得借鉴。它额外定义一个 list 结构体，来组织链表的首、尾指针，还有长度等信息。这样，在使用的时候就会非常方便。<br></p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"></span><br><span class="line">// 以下是C语言代码，因为Redis是用C语言实现的。</span><br><span class="line">typedef struct listnode &#123;</span><br><span class="line">  struct listNode *prev;</span><br><span class="line">  struct listNode *next;</span><br><span class="line">  void *value;</span><br><span class="line">&#125; listNode;</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">typedef struct list &#123;</span><br><span class="line">  listNode *head;</span><br><span class="line">  listNode *tail;</span><br><span class="line">  unsigned long len;</span><br><span class="line">  // ....省略其他定义</span><br><span class="line">&#125; list;</span><br></pre></td></tr></table></figure><p><a name="0PECO"></a></p><h3 id="字典（hash）"><a href="#字典（hash）" class="headerlink" title="字典（hash）"></a>字典（hash）</h3><p>字典类型用来存储一组数据对。每个数据对又包含键值两部分。字典类型也有两种实现方式。<strong>一种是我们刚刚讲到的压缩列表，另一种是散列表</strong>。<br><br><br>同样，只有当存储的数据量比较小的情况下，Redis 才使用压缩列表来实现字典类型。具体需要满足两个条件：字典中保存的键和值的大小都要小于 64 字节；字典中键值对的个数要小于 512 个。<br><br><br>当不能同时满足上面两个条件的时候，Redis 就使用散列表来实现字典类型。Redis 使用MurmurHash2这种运行速度快、随机性好的哈希算法作为哈希函数。对于哈希冲突问题，Redis 使用链表法来解决。<br><br><br>除此之外，Redis 还支持散列表的动态扩容、缩容。当数据动态增加之后，散列表的装载因子会不停地变大。为了避免散列表性能的下降，<strong>当装载因子大于 1 的时候</strong>，Redis 会触发扩容，将散列表扩大为原来大小的 2 倍左右（具体值需要计算才能得到，如果感兴趣，你可以去阅读源码）。<br><br><br>当数据动态减少之后，为了节省内存，<strong>当装载因子小于 0.1 的时候</strong>，Redis 就会触发缩容，缩小为字典中数据个数的大约 2 倍大小（这个值也是计算得到的，如果感兴趣，你也可以去阅读源码）。我们前面讲过，扩容缩容要做大量的数据搬移和哈希值的重新计算，所以比较耗时。针对这个问题，Redis 使用我们在散列表（中）讲的渐进式扩容缩容策略，将数据的搬移分批进行，避免了大量数据一次性搬移导致的服务停顿。<br></p><p><a name="GsT5l"></a></p><h3 id="集合（set）"><a href="#集合（set）" class="headerlink" title="集合（set）"></a>集合（set）</h3><p>集合这种数据类型用来存储一组不重复的数据。这种数据类型也有两种实现方法，<strong>一种是基于有序数组，另一种是基于散列表</strong>。<br><br><br>当要存储的数据，同时满足下面这样两个条件的时候，Redis 就采用有序数组，来实现集合这种数据类型。存储的数据都是整数；存储的数据元素个数不超过 512 个。当不能同时满足这两个条件的时候，Redis 就使用散列表来存储集合中的数据。<br></p><p><a name="UfTbJ"></a></p><h3 id="有序集合（sortedset）"><a href="#有序集合（sortedset）" class="headerlink" title="有序集合（sortedset）"></a>有序集合（sortedset）</h3><p>有序集合这种数据类型，我们在跳表里已经详细讲过了。它用来存储一组数据，并且每个数据会附带一个得分。通过得分的大小，我们将数据组织成跳表这样的数据结构，以支持快速地按照得分值、得分区间获取数据。<br><br><br><strong>实际上，跟 Redis 的其他数据类型一样，有序集合也并不仅仅只有跳表这一种实现方式。当数据量比较小的时候，Redis 会用压缩列表来实现有序集合</strong>。具体点说就是，使用压缩列表来实现有序集合的前提，有这样两个：所有数据的大小都要小于 64 字节；元素个数要小于 128 个。<br></p><p><a name="cQYVz"></a></p><h3 id="数据结构持久化"><a href="#数据结构持久化" class="headerlink" title="数据结构持久化"></a>数据结构持久化</h3><p>尽管 Redis 经常会被用作内存数据库，但是，它也支持数据落盘，也就是将内存中的数据存储到硬盘中。这样，当机器断电的时候，存储在 Redis 中的数据也不会丢失。在机器重新启动之后，Redis 只需要再将存储在硬盘中的数据，重新读取到内存，就可以继续工作了。<br><br><br>刚刚我们讲到，Redis 的数据格式由“键”和“值”两部分组成。而“值”又支持很多数据类型，比如字符串、列表、字典、集合、有序集合。<strong>像字典、集合等类型，底层用到了散列表</strong>，散列表中有指针的概念，而指针指向的是内存中的存储地址。 那 Redis 是如何将这样一个跟具体内存地址有关的数据结构存储到磁盘中的呢？<br><br><br>实际上，Redis 遇到的这个问题并不特殊，很多场景中都会遇到。我们把它叫作数据结构的持久化问题，或者对象的持久化问题。这里的“持久化”，你可以笼统地理解为“存储到磁盘”。如何将数据结构持久化到硬盘？我们主要有两种解决思路。<strong>第一种是清除原有的存储结构，只将数据存储到磁盘中</strong>。当我们需要从磁盘还原数据到内存的时候，再重新将数据组织成原来的数据结构。<br><br><br>实际上，Redis 采用的就是这种持久化思路。不过，这种方式也有一定的弊端。那就是数据从硬盘还原到内存的过程，会耗用比较多的时间。比如，我们现在要将散列表中的数据存储到磁盘。当我们从磁盘中，取出数据重新构建散列表的时候，需要重新计算每个数据的哈希值。如果磁盘中存储的是几 GB 的数据，那重构数据结构的耗时就不可忽视了。<br><br><br>第二种方式是保留原来的存储格式，将数据按照原有的格式存储在磁盘中。我们拿散列表这样的数据结构来举例。我们可以将散列表的大小、每个数据被散列到的槽的编号等信息，都保存在磁盘中。有了这些信息，我们从磁盘中将数据还原到内存中的时候，就可以避免重新计算哈希值。<br></p><p><a name="1BhOU"></a></p><h3 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h3><p>今天，我们学习了 Redis 中常用数据类型底层依赖的数据结构，总结一下大概有这五种：</p><ul><li>压缩列表（可以看作一种特殊的数组）</li><li>有序数组</li><li>链表</li><li>散列表</li><li>跳表</li></ul><p><a name="0w5tj"></a></p><h3 id="Q-amp-A"><a href="#Q-amp-A" class="headerlink" title="Q&amp;A"></a>Q&amp;A</h3><p>Q:为什么redis没有使用B+树而选择跳表？<br>A:跳表更灵活 更容易实现<br><br><br></p><!-- rebuild by neat -->]]></content>
    
    <summary type="html">
    
      
      
        &lt;!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --&gt;&lt;p&gt;&lt;strong&gt;经典数据库 Redis 中的常用数据类型，底层都是用哪种数据结构实现的？&lt;/strong&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;&lt;a name=
      
    
    </summary>
    
    
      <category term="大数据" scheme="cpeixin.cn/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/"/>
    
    
      <category term="redis" scheme="cpeixin.cn/tags/redis/"/>
    
  </entry>
  
  <entry>
    <title>分布式一致性算法 - Paxos（2）</title>
    <link href="cpeixin.cn/2020/08/30/%E5%88%86%E5%B8%83%E5%BC%8F%E4%B8%80%E8%87%B4%E6%80%A7%E7%AE%97%E6%B3%95-Paxos%EF%BC%882%EF%BC%89/"/>
    <id>cpeixin.cn/2020/08/30/%E5%88%86%E5%B8%83%E5%BC%8F%E4%B8%80%E8%87%B4%E6%80%A7%E7%AE%97%E6%B3%95-Paxos%EF%BC%882%EF%BC%89/</id>
    <published>2020-08-30T05:10:05.000Z</published>
    <updated>2020-08-30T05:13:21.224Z</updated>
    
    <content type="html"><![CDATA[<!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --><p><strong>Paxos是共识算法，不是一致性协议</strong></p><p>Basic Paxos 只能就单个值（Value）达成共识，一旦遇到为一系列的值实现共识的时候，它就不管用了。虽然兰伯特提到可以通过多次执行 Basic Paxos 实例（比如每接收到一个值时，就执行一次 Basic Paxos 算法）实现一系列值的共识。但是，很多同学读完论文后，应该还是两眼摸黑，虽然每个英文单词都能读懂，但还是不理解兰伯特提到的 Multi-Paxos，为什么 Multi-Paxos 这么难理解呢？<br><br><br>在我看来，兰伯特并没有把 Multi-Paxos 讲清楚，只是介绍了大概的思想，缺少算法过程的细节和编程所必须的细节（比如缺少选举领导者的细节）。这也就导致每个人实现的 Multi-Paxos 都不一样。不过从本质上看，大家都是在兰伯特提到的 Multi-Paxos 思想上补充细节，设计自己的 Multi-Paxos 算法，然后实现它（比如 Chubby 的 Multi-Paxos 实现、Raft 算法等）。<br><br><br>所以在这里，我补充一下：<strong>兰伯特提到的 Multi-Paxos 是一种思想，不是算法</strong>。而 Multi-Paxos 算法是一个统称，它是指基于 Multi-Paxos 思想，通过多个 Basic Paxos 实例实现一系列值的共识的算法（比如 Chubby 的 Multi-Paxos 实现、Raft 算法等）。 这一点尤其需要你注意。<br><br><br>为了帮你掌握 Multi-Paxos 思想，我会先带你了解，对于 Multi-Paxos 兰伯特是如何思考的，也就是说，如何解决 Basic Paxos 的痛点问题；然后我再以 Chubby 的 Multi-Paxos 实现为例，具体讲解一下。为啥选它呢？因为 Chubby 的 Multi-Paxos 实现，代表了 Multi-Paxos 思想在生产环境中的真正落地，它将一种思想变成了代码实现。<a href="https://github.com/cocagne/multi-paxos-example" target="_blank" rel="external nofollow noopener noreferrer">贴一段6年前的代码🐂</a><br></p><p><a name="7dCdD"></a></p><h3 id="兰伯特关于-Multi-Paxos-的思考"><a href="#兰伯特关于-Multi-Paxos-的思考" class="headerlink" title="兰伯特关于 Multi-Paxos 的思考"></a>兰伯特关于 Multi-Paxos 的思考</h3><p><br>熟悉 Basic Paxos 的同学可能还记得，Basic Paxos 是通过二阶段提交来达成共识的。在第一阶段，也就是准备阶段，接收到大多数准备响应的提议者，才能发起接受请求进入第二阶段（也就是接受阶段）：<br><img src="https://cdn.nlark.com/yuque/0/2020/jpeg/1072113/1598756133779-168f44db-0034-47af-b6bb-736de22b247a.jpeg#align=left&display=inline&height=643&margin=%5Bobject%20Object%5D&name=aafabff1fe2a26523e9815805ccca6e0.jpg&originHeight=643&originWidth=1142&size=231174&status=done&style=none&width=1142" alt="aafabff1fe2a26523e9815805ccca6e0.jpg"><br>而如果我们直接通过多次执行 Basic Paxos 实例，来实现一系列值的共识，就会存在这样几个问题：如果多个提议者同时提交提案，可能出现因为提案编号冲突，在准备阶段没有提议者接收到大多数准备响应，协商失败，需要重新协商。<br><br><br>你想象一下，一个 5 节点的集群，如果 3 个节点作为提议者同时提案，就可能发生因为没有提议者接收大多数响应（比如 1 个提议者接收到 1 个准备响应，另外 2 个提议者分别接收到 2 个准备响应）而准备失败，需要重新协商。2 轮 RPC 通讯（准备阶段和接受阶段）往返消息多、耗性能、延迟大。<br><br><br>你要知道，分布式系统的运行是建立在 RPC 通讯的基础之上的，因此，延迟一直是分布式系统的痛点，是需要我们在开发分布式系统时认真考虑和优化的。那么如何解决上面的 2 个问题呢？可以通过引入领导者和优化 Basic Paxos 执行来解决，咱们首先聊一聊领导者。<br></p><p><a name="t5nVu"></a></p><h3 id="领导者（Leader）"><a href="#领导者（Leader）" class="headerlink" title="领导者（Leader）"></a>领导者（Leader）</h3><p>我们可以通过引入领导者节点，也就是说，领导者节点作为唯一提议者，这样就不存在多个提议者同时提交提案的情况，也就不存在提案冲突的情况了：<br><img src="https://cdn.nlark.com/yuque/0/2020/jpeg/1072113/1598756753905-90610e2f-018b-45e5-8770-f274b36c86ae.jpeg#align=left&display=inline&height=653&margin=%5Bobject%20Object%5D&name=af3d6a291d960ace59a88898abb74ef6.jpg&originHeight=653&originWidth=1142&size=183927&status=done&style=none&width=1142" alt="af3d6a291d960ace59a88898abb74ef6.jpg"><br>在这里，我补充一点：在论文中，兰伯特没有说如何选举领导者，需要我们在实现 Multi-Paxos 算法的时候自己实现。 比如在 Chubby 中，主节点（也就是领导者节点）是通过执行 Basic Paxos 算法，进行投票选举产生的。那么，如何解决第二个问题，也就是如何优化 Basic Paxos 执行呢？<br></p><p><a name="72Cl1"></a></p><h3 id="优化-Basic-Paxos-执行"><a href="#优化-Basic-Paxos-执行" class="headerlink" title="优化 Basic Paxos 执行"></a>优化 Basic Paxos 执行</h3><p>我们可以采用“当领导者处于稳定状态时，省掉准备阶段，直接进入接受阶段”这个优化机制，优化 Basic Paxos 执行。也就是说，领导者节点上，序列中的命令是最新的，不再需要通过准备请求来发现之前被大多数节点通过的提案，领导者可以独立指定提案中的值。这时，领导者在提交命令时，可以省掉准备阶段，直接进入到接受阶段：<br><img src="https://cdn.nlark.com/yuque/0/2020/jpeg/1072113/1598756869035-5e649d41-b6fa-48c1-9141-7ddb36939872.jpeg#align=left&display=inline&height=637&margin=%5Bobject%20Object%5D&name=3cd72a4a138fe1cde52aedd1b897f954.jpg&originHeight=637&originWidth=1142&size=176151&status=done&style=none&width=1142" alt="3cd72a4a138fe1cde52aedd1b897f954.jpg"><br>你看，和重复执行 Basic Paxos 相比，Multi-Paxos 引入领导者节点之后，因为只有领导者节点一个提议者，只有它说了算，所以就不存在提案冲突。另外，当主节点处于稳定状态时，就省掉准备阶段，直接进入接受阶段，所以在很大程度上减少了往返的消息数，提升了性能，降低了延迟。讲到这儿，你可能会问了：在实际系统中，该如何实现 Multi-Paxos 呢？接下来，我以 Chubby 的 Multi-Paxos 实现为例，具体讲解一下。<br></p><p><a name="wnoLq"></a></p><h3 id="Chubby-的-Multi-Paxos-实现"><a href="#Chubby-的-Multi-Paxos-实现" class="headerlink" title="Chubby 的 Multi-Paxos 实现"></a>Chubby 的 Multi-Paxos 实现</h3><p>既然兰伯特只是大概的介绍了 Multi-Paxos 思想，那么 Chubby 是如何补充细节，实现 Multi-Paxos 算法的呢？首先，它通过引入主节点，实现了兰伯特提到的领导者（Leader）节点的特性。也就是说，主节点作为唯一提议者，这样就不存在多个提议者同时提交提案的情况，也就不存在提案冲突的情况了。<br><br><br>另外，在 Chubby 中，主节点是通过执行 Basic Paxos 算法，进行投票选举产生的，并且在运行过程中，主节点会通过不断续租的方式来延长租期（Lease）。比如在实际场景中，几天内都是同一个节点作为主节点。如果主节点故障了，那么其他的节点又会投票选举出新的主节点，也就是说主节点是一直存在的，而且是唯一的。其次，在 Chubby 中实现了兰伯特提到的，“当领导者处于稳定状态时，省掉准备阶段，直接进入接受阶段”这个优化机制。<br><br><br>最后，在 Chubby 中，实现了成员变更（Group membership），以此保证节点变更的时候集群的平稳运行。最后，我想补充一点：在 Chubby 中，为了实现了强一致性，读操作也只能在主节点上执行。 也就是说，只要数据写入成功，之后所有的客户端读到的数据都是一致的。具体的过程，就是下面的样子。<br><br><br>所有的读请求和写请求都由主节点来处理。当主节点从客户端接收到写请求后，作为提议者，执行 Basic Paxos 实例，将数据发送给所有的节点，并且在大多数的服务器接受了这个写请求之后，再响应给客户端成功：<br><img src="https://cdn.nlark.com/yuque/0/2020/jpeg/1072113/1598757183206-92749db1-4909-437c-b122-3d39453a4737.jpeg#align=left&display=inline&height=590&margin=%5Bobject%20Object%5D&name=7e2c2e194d5a0fda5594c5e4e2d9ecb9.jpg&originHeight=590&originWidth=1142&size=171628&status=done&style=none&width=1142" alt="7e2c2e194d5a0fda5594c5e4e2d9ecb9.jpg"><br>当主节点接收到读请求后，处理就比较简单了，主节点只需要查询本地数据，然后返回给客户端就可以了：<br><br><br><img src="https://cdn.nlark.com/yuque/0/2020/jpeg/1072113/1598757195327-c5163200-67aa-4501-b38d-01a1f53fb4d3.jpeg#align=left&display=inline&height=583&margin=%5Bobject%20Object%5D&name=07501bb8d9015af3fb34cf856fe3ec64.jpg&originHeight=583&originWidth=1142&size=130053&status=done&style=none&width=1142" alt="07501bb8d9015af3fb34cf856fe3ec64.jpg">Chubby 的 Multi-Paxos 实现，尽管是一个闭源的实现，但这是 Multi-Paxos 思想在实际场景中的真正落地，Chubby 团队不仅编程实现了理论，还探索了如何补充细节。其中的思考和设计非常具有参考价值，不仅能帮助我们理解 Multi-Paxos 思想，还能帮助我们理解其他的 Multi-Paxos 算法（比如 Raft 算法）。<br></p><p><a name="jpznI"></a></p><h3 id="内容小结"><a href="#内容小结" class="headerlink" title="内容小结"></a>内容小结</h3><p>重点如下：</p><ul><li>兰伯特提到的 Multi-Paxos 是一种思想，不是算法，而且还缺少算法过程的细节和编程所必须的细节，比如如何选举领导者等，这也就导致了每个人实现的 Multi-Paxos 都不一样。而 Multi-Paxos 算法是一个统称，它是指基于 Multi-Paxos 思想，通过多个 Basic Paxos 实例实现一系列数据的共识的算法（比如 Chubby 的 Multi-Paxos 实现、Raft 算法等）。</li><li>Chubby 实现了主节点（也就是兰伯特提到的领导者），也实现了兰伯特提到的 “当领导者处于稳定状态时，省掉准备阶段，直接进入接受阶段” 这个优化机制，省掉 Basic Paxos 的准备阶段，提升了数据的提交效率，但是所有写请求都在主节点处理，限制了集群处理写请求的并发能力，约等于单机。</li><li>因为在 Chubby 的 Multi-Paxos 实现中，也约定了“大多数原则”，也就是说，只要大多数节点正常运行时，集群就能正常工作，所以 Chubby 能容错（n - 1）/2 个节点的故障。</li><li>本质上而言，“当领导者处于稳定状态时，省掉准备阶段，直接进入接受阶段”这个优化机制，是通过减少非必须的协商步骤来提升性能的。这种方法非常常用，也很有效。比如，Google 设计的 QUIC 协议，是通过减少 TCP、TLS 的协商步骤，优化 HTTPS 性能。我希望你能掌握这种性能优化思路，后续在需要时，可以通过减少非必须的步骤，优化系统性能。</li></ul><p><br>最后，我想说的是，我个人比较喜欢 Paxos 算法（兰伯特的 Basic Paxos 和 Multi-Paxos），虽然 Multi-Paxos 缺失算法细节，但这反而给我们提供了思考空间，让我们可以反复思考和考据缺失的细节，比如在 Multi-Paxos 中到底需不需要选举领导者，再比如如何实现提案编号等等。</p><!-- rebuild by neat -->]]></content>
    
    <summary type="html">
    
      
      
        &lt;!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --&gt;&lt;p&gt;&lt;strong&gt;Paxos是共识算法，不是一致性协议&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Basic Paxos 只能就单个值（Value）达成共识，
      
    
    </summary>
    
    
      <category term="分布式" scheme="cpeixin.cn/categories/%E5%88%86%E5%B8%83%E5%BC%8F/"/>
    
    
      <category term="paxos" scheme="cpeixin.cn/tags/paxos/"/>
    
  </entry>
  
  <entry>
    <title>分布式一致性算法 - Paxos（1）</title>
    <link href="cpeixin.cn/2020/08/29/%E5%88%86%E5%B8%83%E5%BC%8F%E4%B8%80%E8%87%B4%E6%80%A7%E7%AE%97%E6%B3%95-Paxos%EF%BC%881%EF%BC%89/"/>
    <id>cpeixin.cn/2020/08/29/%E5%88%86%E5%B8%83%E5%BC%8F%E4%B8%80%E8%87%B4%E6%80%A7%E7%AE%97%E6%B3%95-Paxos%EF%BC%881%EF%BC%89/</id>
    <published>2020-08-28T16:54:44.000Z</published>
    <updated>2020-08-30T06:03:04.075Z</updated>
    
    <content type="html"><![CDATA[<!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --><p><strong>生有涯知无涯，只有掌握了最核心的“道”，才能以不变应万变。</strong></p><p><a name="dIhdX"></a></p><h3 id="Paxos-算法"><a href="#Paxos-算法" class="headerlink" title="Paxos 算法"></a>Paxos 算法</h3><p>提到分布式算法，就不得不提 Paxos 算法，在过去几十年里，它基本上是分布式共识的代名词，因为当前最常用的一批共识算法都是基于它改进的。比如，Fast Paxos 算法、Cheap Paxos 算法、Raft 算法等等。而很多同学都会在准确和系统理解 Paxos 算法上踩坑，比如，只知道它可以用来达成共识，但不知道它是如何达成共识的。<br><br><br>这其实侧面说明了 Paxos 算法有一定的难度，可分布式算法本身就很复杂，Paxos 算法自然也不会例外，当然了，除了这一点，还跟兰伯特有关。兰伯特提出的 Paxos 算法包含 2 个部分：</p><ul><li>一个是 Basic Paxos 算法，描述的是多节点之间如何就某个值（提案 Value）达成共识；</li><li>另一个是 Multi-Paxos 思想，描述的是执行多个 Basic Paxos 实例，就一系列值达成共识。</li></ul><p><br>可因为兰伯特提到的 Multi-Paxos 思想，缺少代码实现的必要细节（比如怎么选举领导者），<a href="https://github.com/henryr/toy_paxos/blob/master/toy_paxos.py" target="_blank" rel="external nofollow noopener noreferrer">这里举一段十年前的代码🐂</a>，接下来分别以 Basic Paxos 和 Multi-Paxos 为核心，带你了解 Basic Paxos 如何达成共识，以及针对 Basic Paxos 的局限性 Multi-Paxos 又是如何改进的。今天咱们先来聊聊 Basic Paxos。<br><br><br>在我看来，Basic Paxos 是 Multi-Paxos 思想的核心，说白了，Multi-Paxos 就是多执行几次 Basic Paxos。所以掌握它之后，你能更好地理解后几讲基于 Multi-Paxos 思想的共识算法（比如 Raft 算法），还能掌握分布式共识算法的最核心内容，当现在的算法不能满足业务需求，进行权衡折中，设计自己的算法。<br><br><br>假设我们要实现一个分布式集群，这个集群是由节点 A、B、C 组成，提供只读 KV 存储服务。你应该知道，创建只读变量的时候，必须要对它进行赋值，而且这个值后续没办法修改。因此一个节点创建只读变量后就不能再修改它了，所以所有节点必须要先对只读变量的值达成共识，然后所有节点再一起创建这个只读变量。<br><br><br>那么，当有多个客户端（比如客户端 1、2）访问这个系统，试图创建同一个只读变量（比如 X），客户端 1 试图创建值为 3 的 X，客户端 2 试图创建值为 7 的 X，这样要如何达成共识，实现各节点上 X 值的一致呢？带着这个问题，我们进入今天的学习。<br><img src="https://cdn.nlark.com/yuque/0/2020/jpeg/1072113/1598714240201-fe8f451b-97ca-42f5-970a-16dc62caead3.jpeg#align=left&display=inline&height=424&margin=%5Bobject%20Object%5D&name=93a9fa0a75c23971066dc791389b8567.jpg&originHeight=424&originWidth=1142&size=123358&status=done&style=none&width=1142" alt="93a9fa0a75c23971066dc791389b8567.jpg"><br>在一些经典的算法中，你会看到一些既形象又独有的概念（比如二阶段提交协议中的协调者），Basic Paxos 算法也不例外。为了帮助人们更好地理解 Basic Paxos 算法，兰伯特在讲解时，也使用了一些独有而且比较重要的概念，提案、准备（Prepare）请求、接受（Accept）请求、角色等等，其中最重要的就是“角色”。因为角色是对 Basic Paxos 中最核心的三个功能的抽象，比如，由接受者（Acceptor）对提议的值进行投票，并存储接受的值。<br></p><p><a name="jcnh4"></a></p><h3 id="你需要了解的三种角色"><a href="#你需要了解的三种角色" class="headerlink" title="你需要了解的三种角色"></a>你需要了解的三种角色</h3><p><br>在 Basic Paxos 中，有提议者（Proposer）、接受者（Acceptor）、学习者（Learner）三种角色，他们之间的关系如下：<br><img src="https://cdn.nlark.com/yuque/0/2020/jpeg/1072113/1598718407860-296e72de-7e71-4627-83b7-d532e17b53ed.jpeg#align=left&display=inline&height=620&margin=%5Bobject%20Object%5D&name=77be9903f7cbe980e5a6e77412d2ad42.jpg&originHeight=620&originWidth=1142&size=183161&status=done&style=none&width=1142" alt="77be9903f7cbe980e5a6e77412d2ad42.jpg"><br>看着是不是有些复杂，其实并不难理解：</p><ul><li>提议者（Proposer）：提议一个值，用于投票表决。为了方便演示，你可以把图 1 中的客户端 1 和 2 看作是提议者。但在绝大多数场景中，集群中收到客户端请求的节点，才是提议者（图 1 这个架构，是为了方便演示算法原理）。这样做的好处是，对业务代码没有入侵性，也就是说，我们不需要在业务代码中实现算法逻辑，就可以像使用数据库一样访问后端的数据。</li><li>接受者（Acceptor）：对每个提议的值进行投票，并存储接受的值，比如 A、B、C 三个节点。 一般来说，集群中的所有节点都在扮演接受者的角色，参与共识协商，并接受和存储数据。</li></ul><p><br>讲到这儿，你可能会有疑惑：前面不是说接收客户端请求的节点是提议者吗？这里怎么又是接受者呢？这是因为一个节点（或进程）可以身兼多个角色。想象一下，一个 3 节点的集群，1 个节点收到了请求，那么该节点将作为提议者发起二阶段提交，然后这个节点和另外 2 个节点一起作为接受者进行共识协商，就像下图的样子：<br><img src="https://cdn.nlark.com/yuque/0/2020/jpeg/1072113/1598718495418-4406554e-ce13-43a9-b773-8248e40bb0de.jpeg#align=left&display=inline&height=590&margin=%5Bobject%20Object%5D&name=3fed0fe5682f97f0a9249cf9519d09fe.jpg&originHeight=590&originWidth=1142&size=146259&status=done&style=none&width=1142" alt="3fed0fe5682f97f0a9249cf9519d09fe.jpg"></p><ul><li>学习者（Learner）：被告知投票的结果，接受达成共识的值，存储保存，不参与投票的过程。</li></ul><p><br>一般来说，学习者是数据备份节点，比如“Master-Slave”模型中的 Slave，被动地接受数据，容灾备份。其实，这三种角色，在本质上代表的是三种功能：</p><ul><li>提议者代表的是接入和协调功能，收到客户端请求后，发起二阶段提交，进行共识协商；</li><li>接受者代表投票协商和存储数据，对提议的值进行投票，并接受达成共识的值，存储保存；</li><li>学习者代表存储数据，不参与共识协商，只接受达成共识的值，存储保存。</li></ul><p><br>因为一个完整的算法过程是由这三种角色对应的功能组成的，所以理解这三种角色，是你理解 Basic Paxos 如何就提议的值达成共识的基础。<br></p><p><a name="b4bY8"></a></p><h3 id="如何达成共识？"><a href="#如何达成共识？" class="headerlink" title="如何达成共识？"></a>如何达成共识？</h3><p>想象这样一个场景，现在疫情这么严重，每个村的路都封得差不多了，就你的村委会不作为，迟迟没有什么防疫的措施。你决定给村委会提交个提案，提一些防疫的建议，除了建议之外，为了和其他村民的提案做区分，你的提案还得包含一个提案编号，来起到唯一标识的作用。与你的做法类似，在 Basic Paxos 中，兰伯特也使用提案代表一个提议。不过在提案中，除了提案编号，还包含了提议值。<br><br><br>为了方便演示，我使用[n, v]表示一个提案，其中 n 为提案编号，v 为提议值。我想强调一下，整个共识协商是分 2 个阶段进行的。<br><br><br>那么具体要如何协商呢？我们假设客户端 1 的提案编号为 1，客户端 2 的提案编号为 5，并假设节点 A、B 先收到来自客户端 1 的准备请求，节点 C 先收到来自客户端 2 的准备请求。<br></p><p><a name="2bzDZ"></a></p><h3 id="准备（Prepare）阶段"><a href="#准备（Prepare）阶段" class="headerlink" title="准备（Prepare）阶段"></a>准备（Prepare）阶段</h3><ul><li>先来看第一个阶段，首先客户端 1、2 作为提议者，分别向所有接受者发送包含提案编号的准备请求：</li></ul><p><img src="https://cdn.nlark.com/yuque/0/2020/jpeg/1072113/1598719185899-5e8fb0b6-c519-44b5-861f-025dd4c32a8c.jpeg#align=left&display=inline&height=456&margin=%5Bobject%20Object%5D&name=640219532d0fcdffc08dbd1b3b3f0454.jpg&originHeight=456&originWidth=1142&size=172164&status=done&style=none&width=1142" alt="640219532d0fcdffc08dbd1b3b3f0454.jpg"><br><strong>你要注意，在准备请求中是不需要指定提议的值的，只需要携带提案编号就可以了，这是很多同学容易产生误解的地方。</strong><br><br><br>接着，当节点 A、B 收到提案编号为 1 的准备请求，节点 C 收到提案编号为 5 的准备请求后，将进行这样的处理：<br><img src="https://cdn.nlark.com/yuque/0/2020/jpeg/1072113/1598719234391-fe1f9e0a-7329-41fb-a633-18569f0ca35f.jpeg#align=left&display=inline&height=445&margin=%5Bobject%20Object%5D&name=5b6fcc5af76ad53e62c433e2589b6d7a.jpg&originHeight=445&originWidth=1142&size=200624&status=done&style=none&width=1142" alt="5b6fcc5af76ad53e62c433e2589b6d7a.jpg"></p><ul><li>由于之前没有通过任何提案，所以节点 A、B 将返回一个 “尚无提案”的响应。也就是说节点 A 和 B 在告诉提议者，我之前没有通过任何提案呢，并承诺以后不再响应提案编号小于等于 1 的准备请求，不会通过编号小于 1 的提案。</li><li>节点 C 也是如此，它将返回一个 “尚无提案”的响应，并承诺以后不再响应提案编号小于等于 5 的准备请求，不会通过编号小于 5 的提案。</li></ul><p><br>另外，当节点 A、B 收到提案编号为 5 的准备请求，和节点 C 收到提案编号为 1 的准备请求的时候，将进行这样的处理过程：<br><img src="https://cdn.nlark.com/yuque/0/2020/jpeg/1072113/1598719296674-d4d102ce-e7f7-4576-97f1-50ef2c3f3581.jpeg#align=left&display=inline&height=439&margin=%5Bobject%20Object%5D&name=ecf9a5872201e875a2e0417c32ec2d24.jpg&originHeight=439&originWidth=1142&size=161224&status=done&style=none&width=1142" alt="ecf9a5872201e875a2e0417c32ec2d24.jpg"></p><ul><li>当节点 A、B 收到提案编号为 5 的准备请求的时候，因为提案编号 5 大于它们之前响应的准备请求的提案编号 1，而且两个节点都没有通过任何提案，所以它将返回一个 “尚无提案”的响应，并承诺以后不再响应提案编号小于等于 5 的准备请求，不会通过编号小于 5 的提案。</li><li>当节点 C 收到提案编号为 1 的准备请求的时候，由于提案编号 1 小于它之前响应的准备请求的提案编号 5，所以丢弃该准备请求，不做响应。</li></ul><p><a name="IpM7R"></a></p><h3 id="接受（Accept）阶段"><a href="#接受（Accept）阶段" class="headerlink" title="接受（Accept）阶段"></a>接受（Accept）阶段</h3><p>第二个阶段也就是接受阶段，首先客户端 1、2 在收到大多数节点的准备响应之后，会分别发送接受请求：<br><img src="https://cdn.nlark.com/yuque/0/2020/jpeg/1072113/1598719518101-4e2fae87-9ab0-4485-8d17-3bb0eef8cc6c.jpeg#align=left&display=inline&height=434&margin=%5Bobject%20Object%5D&name=70de602cb4b52de7545f05c5485deb89.jpg&originHeight=434&originWidth=1142&size=169628&status=done&style=none&width=1142" alt="70de602cb4b52de7545f05c5485deb89.jpg"></p><ul><li>当客户端 1 收到大多数的接受者（节点 A、B）的准备响应后，根据响应中提案编号最大的提案的值，设置接受请求中的值。因为该值在来自节点 A、B 的准备响应中都为空（也就是图 5 中的“尚无提案”），所以就把自己的提议值 3 作为提案的值，发送接受请求[1, 3]。</li><li>当客户端 2 收到大多数的接受者的准备响应后（节点 A、B 和节点 C），根据响应中提案编号最大的提案的值，来设置接受请求中的值。因为该值在来自节点 A、B、C 的准备响应中都为空（也就是图 5 和图 6 中的“尚无提案”），所以就把自己的提议值 7 作为提案的值，发送接受请求[5, 7]。</li></ul><p><br>当三个节点收到 2 个客户端的接受请求时，会进行这样的处理：<br><img src="https://cdn.nlark.com/yuque/0/2020/jpeg/1072113/1598719574307-c8dbf077-2567-474d-ad54-8a8b806f3155.jpeg#align=left&display=inline&height=430&margin=%5Bobject%20Object%5D&name=f836c40636d26826fc04a51a5945d545.jpg&originHeight=430&originWidth=1142&size=146777&status=done&style=none&width=1142" alt="f836c40636d26826fc04a51a5945d545.jpg"></p><ul><li>当节点 A、B、C 收到接受请求[1, 3]的时候，由于提案的提案编号 1 小于三个节点承诺能通过的提案的最小提案编号 5，所以提案[1, 3]将被拒绝。</li><li>当节点 A、B、C 收到接受请求[5, 7]的时候，由于提案的提案编号 5 不小于三个节点承诺能通过的提案的最小提案编号 5，所以就通过提案[5, 7]，也就是接受了值 7，三个节点就 X 值为 7 达成了共识。</li></ul><p><br>讲到这儿我想补充一下，如果集群中有学习者，当接受者通过了一个提案时，就通知给所有的学习者。当学习者发现大多数的接受者都通过了某个提案，那么它也通过该提案，接受该提案的值。<br><br><br>通过上面的演示过程，你可以看到，最终各节点就 X 的值达成了共识。那么在这里我还想强调一下，Basic Paxos 的容错能力，源自“大多数”的约定，你可以这么理解：当少于一半的节点出现故障的时候，共识协商仍然在正常工作。<br></p><p><a name="Z4MHA"></a></p><h3 id="内容小结"><a href="#内容小结" class="headerlink" title="内容小结"></a>内容小结</h3><p>本节课我主要带你了解了 Basic Paxos 的原理和一些特点，我希望你明确这样几个重点。<strong>你可以看到，Basic Paxos 是通过二阶段提交的方式来达成共识的。二阶段提交是达成共识的常用方式</strong>，如果你需要设计新的共识算法的时候，也可以考虑这个方式。<br><br><br>除了共识，Basic Paxos 还实现了容错，在少于一半的节点出现故障时，集群也能工作。它不像分布式事务算法那样，必须要所有节点都同意后才提交操作，因为“所有节点都同意”这个原则，在出现节点故障的时候会导致整个集群不可用。也就是说，“<strong>大多数节点都同意</strong>”的原则，赋予了 Basic Paxos 容错的能力，让它能够容忍少于一半的节点的故障。<br><br><br>本质上而言，提案编号的大小代表着优先级，你可以这么理解，根据提案编号的大小，接受者保证<strong>三个承诺</strong>，具体来说：</p><ul><li>如果准备请求的提案编号，小于等于接受者已经响应的准备请求的提案编号，那么接受者将承诺不响应这个准备请求；</li><li>如果接受请求中的提案的提案编号，小于接受者已经响应的准备请求的提案编号，那么接受者将承诺不通过这个提案；</li><li>如果接受者之前有通过提案，那么接受者将承诺，会在准备请求的响应中，包含已经通过的最大编号的提案信息。</li></ul><p><br>Basic Paxos 算法解决的问题是一个分布式系统如何就某个值X（决议）达成一致。一个典型的场景是，在一个分布式数据库存储中，如果各节点的初始状态一致，每个节点执行相同的操作序列（例如上面是每个节点都要执行SET(X, 3)操作和SET(X, 7)操作，其中SET(X, 3)指令是客户端1向每个节点发出，SET(X, 7)指令是客户端2向每个节点发出），那么他们最后肯定能达到一个一致的状态。例如A、B、C三个节点执行的指令按照顺序都是 SET(X,3)–&gt;SET(X, 7)的话，则A、B、C三个节点最终X的值都是7.<br><br><br>但是网络的不可靠性导致实际上最后每个节点执行的指令顺序并不一样，如果A节点、B节点执行顺序是SET(X,3)–&gt;SET(X,7)，则A、B节点上最后的值就是X=7，而C节点上可能因为网络原因执行的顺序是 SET(X,7)–&gt;SET(X,3), 结果最后C节点上的X值就是3了. 这种情况可能导致客户端发送相同执行指令，但是最终节点上的值不完全相同. 我们当然不希望看到这种情况的发生. 于是 Basic paxos算法横空出世.<br>basic paxos算法需要在每一条指令上执行一个“一致性算法”以保证每个节点看到的指令一致。<br><br><br><br><br></p><!-- rebuild by neat -->]]></content>
    
    <summary type="html">
    
      
      
        &lt;!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --&gt;&lt;p&gt;&lt;strong&gt;生有涯知无涯，只有掌握了最核心的“道”，才能以不变应万变。&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;&lt;a name=&quot;dIhdX&quot;&gt;&lt;/a
      
    
    </summary>
    
    
      <category term="分布式" scheme="cpeixin.cn/categories/%E5%88%86%E5%B8%83%E5%BC%8F/"/>
    
    
      <category term="paxos" scheme="cpeixin.cn/tags/paxos/"/>
    
  </entry>
  
  <entry>
    <title>Flink从源码看状态定时清理</title>
    <link href="cpeixin.cn/2020/08/14/Flink%E4%BB%8E%E6%BA%90%E7%A0%81%E7%9C%8B%E7%8A%B6%E6%80%81%E5%AE%9A%E6%97%B6%E6%B8%85%E7%90%86/"/>
    <id>cpeixin.cn/2020/08/14/Flink%E4%BB%8E%E6%BA%90%E7%A0%81%E7%9C%8B%E7%8A%B6%E6%80%81%E5%AE%9A%E6%97%B6%E6%B8%85%E7%90%86/</id>
    <published>2020-08-14T07:19:54.000Z</published>
    <updated>2020-10-09T07:21:30.324Z</updated>
    
    <content type="html"><![CDATA[<!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --><p>状态是Flink中的一个重要特征，不论是存储在内存中，还是在rockdb中，但是我们有时候需要对状态进行清查，不需要保存太久的状态，否则这个状态会太大，这个时候就会用到Flink的清除。<br></p><p><a name="2kNyt"></a></p><h3 id="为什么需要清理状态"><a href="#为什么需要清理状态" class="headerlink" title="为什么需要清理状态"></a>为什么需要清理状态</h3><p>其一是Flink中的状态是有时效性的，也就是在一定的实际内，是有效的，一旦过了某个时间点，它就没有什么价值了，其二是需要控制flink的状态大小，能够有效的管理不断增长的state的规模大小。<br><br><br>从flink的1.6中，就引入了关于状态时效性方面的特性，在1.8中，引入了基于TTL的对过去state的清理，让我们可以通过程序的 方式，对state进行清理，否则还的依赖其他额外的操作，来对state清理，这样会容易导致出错，并且不容易控制。<br><br><br>Apache Flink的1.6.0版本引入了State TTL功能。它使流处理应用程序的开发人员配置过期时间，并在定义时间超时（Time to Live）之后进行清理。在Flink 1.8.0中，该功能得到了扩展，包括对RocksDB和堆状态后端（FSStateBackend和MemoryStateBackend）的历史数据进行持续清理，从而实现旧条目的连续清理过程（根据TTL设置）。<br><br><br>Flink程序中的状态，是通过状态描述符来定义的，所以我们定义ttl，是通过flink程序中的StateTtlConfiguration对象，传递给状态描述符，来实现状态的清理。</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">StateDemo</span> </span>&#123;</span><br><span class="line">    <span class="function"><span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">void</span> <span class="title">main</span><span class="params">(String[] args)</span> <span class="keyword">throws</span> Exception </span>&#123;</span><br><span class="line">        LocalStreamEnvironment env = StreamExecutionEnvironment.createLocalEnvironment();</span><br><span class="line">        <span class="comment">// this can be used in a streaming program like this (assuming we have a StreamExecutionEnvironment env)</span></span><br><span class="line">        env.fromElements(Tuple2.of(<span class="number">1L</span>, <span class="number">3L</span>), Tuple2.of(<span class="number">1L</span>, <span class="number">5L</span>), Tuple2.of(<span class="number">1L</span>, <span class="number">10L</span>), Tuple2.of(<span class="number">1L</span>, <span class="number">4L</span>), Tuple2.of(<span class="number">1L</span>, <span class="number">2L</span>))</span><br><span class="line">                .keyBy(<span class="number">0</span>)</span><br><span class="line">                .flatMap(<span class="keyword">new</span> MyFlatMapFunction())</span><br><span class="line">                .print();</span><br><span class="line"></span><br><span class="line">        <span class="comment">// the printed output will be (1,4) and (1,5)</span></span><br><span class="line">        env.execute();</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">MyFlatMapFunction</span> <span class="keyword">extends</span> <span class="title">RichFlatMapFunction</span>&lt;<span class="title">Tuple2</span>&lt;<span class="title">Long</span>, <span class="title">Long</span>&gt;, <span class="title">Tuple2</span>&lt;<span class="title">Long</span>, <span class="title">Long</span>&gt;&gt; </span>&#123;</span><br><span class="line"></span><br><span class="line">    <span class="keyword">private</span> <span class="keyword">static</span> <span class="keyword">final</span> <span class="keyword">long</span> serialVersionUID = <span class="number">1808329479322205953L</span>;</span><br><span class="line">    <span class="comment">/**</span></span><br><span class="line"><span class="comment">     * The ValueState handle. The first field is the count, the second field a running sum.</span></span><br><span class="line"><span class="comment">     */</span></span><br><span class="line">    <span class="keyword">private</span> <span class="keyword">transient</span> ValueState&lt;Tuple2&lt;Long, Long&gt;&gt; sum;</span><br><span class="line"></span><br><span class="line">    <span class="comment">// 状态过期清除</span></span><br><span class="line">    <span class="comment">// flink 的状态清理是惰性策略，也就是我们访问的状态，可能已经过期了，但是还没有删除状态数据，我们可以配置</span></span><br><span class="line">    <span class="comment">// 是否返回过期状态的数据，不论是否返回过期数据，数据被访问后会立即清除过期状态。并且截止1.8.0 的版本</span></span><br><span class="line">    <span class="comment">// 状态的清除针对的是process time ，还不支持event time，可能在后期的版本中会支持。</span></span><br><span class="line"></span><br><span class="line">    <span class="comment">// flink的内部，状态ttl 功能是通过上次相关状态访问的附加时间戳和实际状态值来实现的，这样的方案会增加存储</span></span><br><span class="line">    <span class="comment">// 上的开销，但是会允许flink程序在查询数据，cp的时候访问数据的过期状态</span></span><br><span class="line">    StateTtlConfig ttlConfig =</span><br><span class="line">            StateTtlConfig.newBuilder(Time.days(<span class="number">1</span>)) <span class="comment">//它是生存时间值</span></span><br><span class="line">                    .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)</span><br><span class="line">                    <span class="comment">//状态可见性配置是否在读取访问时返回过期值</span></span><br><span class="line"><span class="comment">//            .setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired)</span></span><br><span class="line">                    .cleanupFullSnapshot() <span class="comment">// 在快照的时候进行删除</span></span><br><span class="line">                    .build();</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">    <span class="meta">@Override</span></span><br><span class="line">    <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">flatMap</span><span class="params">(Tuple2&lt;Long, Long&gt; input, Collector&lt;Tuple2&lt;Long, Long&gt;&gt; out)</span> <span class="keyword">throws</span> Exception </span>&#123;</span><br><span class="line"></span><br><span class="line">        <span class="comment">// access the state value</span></span><br><span class="line">        Tuple2&lt;Long, Long&gt; currentSum = sum.value();</span><br><span class="line"></span><br><span class="line">        <span class="comment">// update the count</span></span><br><span class="line">        currentSum.f0 += <span class="number">1</span>;</span><br><span class="line"></span><br><span class="line">        <span class="comment">// add the second field of the input value</span></span><br><span class="line">        currentSum.f1 += input.f1;</span><br><span class="line"></span><br><span class="line">        <span class="comment">// update the state</span></span><br><span class="line">        sum.update(currentSum);</span><br><span class="line"></span><br><span class="line">        <span class="comment">// if the count reaches 2, emit the average and clear the state</span></span><br><span class="line">        <span class="keyword">if</span> (currentSum.f0 &gt;= <span class="number">2</span>) &#123;</span><br><span class="line">            out.collect(<span class="keyword">new</span> Tuple2&lt;&gt;(input.f0, currentSum.f1 / currentSum.f0));</span><br><span class="line">            sum.clear();</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="meta">@Override</span></span><br><span class="line">    <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">open</span><span class="params">(Configuration config)</span> </span>&#123;</span><br><span class="line">        ValueStateDescriptor&lt;Tuple2&lt;Long, Long&gt;&gt; descriptor =</span><br><span class="line">                <span class="keyword">new</span> ValueStateDescriptor&lt;&gt;(</span><br><span class="line">                        <span class="string">"average"</span>, <span class="comment">// the state name</span></span><br><span class="line">                        TypeInformation.of(<span class="keyword">new</span> TypeHint&lt;Tuple2&lt;Long, Long&gt;&gt;() &#123;</span><br><span class="line">                        &#125;), <span class="comment">// type information</span></span><br><span class="line">                        Tuple2.of(<span class="number">0L</span>, <span class="number">0L</span>)); <span class="comment">// default value of the state, if nothing was set</span></span><br><span class="line"></span><br><span class="line">        <span class="comment">//设置stage过期时间</span></span><br><span class="line">        descriptor.enableTimeToLive(ttlConfig);</span><br><span class="line">        sum = getRuntimeContext().getState(descriptor);</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>上面是定义了StateTtlConfig ，用于描述状态清除的配置信息，这个类是状态清除的核心配置类。Flink提供了多个选项来配置TTL功能的行为。<br><a name="DY1kJ"></a></p><h3 id="状态的时间什么被修改？"><a href="#状态的时间什么被修改？" class="headerlink" title="状态的时间什么被修改？"></a>状态的时间什么被修改？</h3><p>默认情况下，是当我们的数据状态修改会更新数据的ttl时间，当然我们也可以在读取数据时候对它进行更新，这样做会出现额外写入操作来更新时间戳操作。<br><a name="eBskO"></a></p><h3><a href="#" class="headerlink"></a></h3><p><a name="hFX6y"></a></p><h3 id="过去的状态数据是否可以访问？"><a href="#过去的状态数据是否可以访问？" class="headerlink" title="过去的状态数据是否可以访问？"></a>过去的状态数据是否可以访问？</h3><p>state ttl采用惰性策略来清理过期状态。这可能导致我们的应用程序会去尝试读取已过期但处于尚未删除状态的数据。我们可以观察此类读取请求是否返回了过期状态。无论哪种情况，数据被访问后会立即清除过期状态<br><strong><br></strong>注意**<br>使用Flink 1.8.0，用户只能根据处理时间（Processing Time）定义状态TTL。未来的Apache Flink版本中计划支持事件时间（Event Time）<br>Flink内部，状态TTL功能是通过存储上次相关状态访问的附加时间戳以及实际状态值来实现的。虽然这种方法增加了一些存储开销，但它允许Flink程序在查询数据、checkpointing，数据恢复的时候访问数据的过期状态。<br><a name="LE3vG"></a></p><h3 id="-1"><a href="#-1" class="headerlink"></a></h3><p><a name="eNuVe"></a></p><h3 id="如何避免读取过期数据？"><a href="#如何避免读取过期数据？" class="headerlink" title="如何避免读取过期数据？"></a>如何避免读取过期数据？</h3><p>在读取操作中访问状态对象时，Flink将检查其时间戳并清除状态是否已过期（取决于配置的状态可见性，是否返回过期状态）。由于这种延迟删除的特性，永远不会再次访问的过期状态数据将永远占用存储空间，除非被垃圾回收。<br><br><br>那么如何在没有应用程序逻辑明确的处理它的情况下删除过期的状态呢？通常，我们可以配置不同的策略进行后台删除。<br></p><p><a name="J8Yh9"></a></p><h3 id="完整快照自动删除过期状态"><a href="#完整快照自动删除过期状态" class="headerlink" title="完整快照自动删除过期状态"></a>完整快照自动删除过期状态</h3><p>当获取检查点或保存点的完整快照时，Flink 1.6.0已经支持自动删除过期状态。大家注意，过期状态删除不适用于增量检查点。必须明确启用完全快照的状态删除</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">StateTtlConfig ttlConfig &#x3D;</span><br><span class="line">            StateTtlConfig.newBuilder(Time.days(1)) &#x2F;&#x2F;它是生存时间值</span><br><span class="line">                    .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)</span><br><span class="line">                    .cleanupFullSnapshot() &#x2F;&#x2F; 在快照的时候进行删除</span><br><span class="line">                    .build();</span><br><span class="line">&#x2F;** Cleanup expired state in full snapshot on checkpoint. *&#x2F;</span><br><span class="line">@Nonnull</span><br><span class="line">public Builder cleanupFullSnapshot() &#123;</span><br><span class="line">cleanupStrategies.activate(CleanupStrategies.Strategies.FULL_STATE_SCAN_SNAPSHOT);</span><br><span class="line">return this;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong><br></strong>堆状态后端的增量清理**<br>此方法特定于堆状态后端（FSStateBackend和MemoryStateBackend）。它的实现方法是存储后端在所有状态条目上维护一个惰性全局迭代器。某些事件（例如状态访问）会触发增量清理。每次触发增量清理时，迭代器都会向前迭代删除已遍历的过期数据。以下代码示例演示如何启用增量清理</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">StateTtlConfig ttlConfig &#x3D;</span><br><span class="line">            StateTtlConfig.newBuilder(Time.days(1)) &#x2F;&#x2F;它是生存时间值</span><br><span class="line">                    .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)</span><br><span class="line">                    .cleanupIncrementally(15,false)&#x2F;&#x2F;</span><br><span class="line">                    .build();</span><br></pre></td></tr></table></figure><p><br>如果启用，则每次进行状态访问都会触发清理步骤。对于每个清理步骤，都会检查一定数量的数据是否过期。<br>参数说明：<br>第一个参数是检查每个清理步骤的状态条目数。<br>第二个参数是一个标志，用于数据处理后触发清理步骤，此外对于每次状态访问同样有效。<br><strong><br></strong>注意**<br>第一个是增量清理所花费的时间增加了数据处理延迟。<br>第二个应该可以忽略不计，但仍然值得一提：如果没有状态访问或没有数据处理记录，则不会删除过期状态<br></p><p><a name="6kRbu"></a></p><h3 id="RocksDB后台压缩可以过滤掉过期状态"><a href="#RocksDB后台压缩可以过滤掉过期状态" class="headerlink" title="RocksDB后台压缩可以过滤掉过期状态"></a>RocksDB后台压缩可以过滤掉过期状态</h3><p>如果你的Flink应用程序使用RocksDB作为状态后端存储，则可以启用另一个基于Flink特定压缩过滤器的清理策略。RocksDB定期运行异步压缩以合并状态更新并减少存储。Flink压缩过滤器使用TTL检查状态条目的到期时间戳，并丢弃所有过期值。<br><br><br>激活此功能的第一步是通过设置以下Flink配置选项来配置RocksDB状态后端</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">state.backend.rocksdb.ttl.compaction.filter.enabled</span><br></pre></td></tr></table></figure><p>配置RocksDB状态后端后，将为状态启用压缩清理策略，如以下代码示例所示</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">StateTtlConfig ttlConfig &#x3D;</span><br><span class="line">            StateTtlConfig.newBuilder(Time.days(1)) &#x2F;&#x2F;它是生存时间值</span><br><span class="line">                    .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)</span><br><span class="line">                    .cleanupInRocksdbCompactFilter() &#x2F;&#x2F; 基于rocksdb的定期压缩合并进行清理</span><br><span class="line">                    .build();</span><br></pre></td></tr></table></figure><p><strong><br></strong>使用定时器删除（Timers）<strong><br>手动清除状态的另一种方法是基于Flink定时器。这是社区目前正在评估未来版本的想法。通过这种方法，为每个状态访问注册清理定时器。这种方法更容易预测，因为状态一旦到期就会被删除。但是，这种方法代价很大，因为定时器消耗存储资源，并且会频繁读取状态信息。<br></strong><br><strong>StateTtlConfig</strong><br>这个类用于配置state TTL 的逻辑<br>UpdateType 这个枚举类表示此选项值配置何时更新上次访问时间戳记，以延长状态TTL</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">public enum UpdateType &#123;</span><br><span class="line">Disabled,&#x2F;&#x2F;禁用过期，状态不过期</span><br><span class="line">OnCreateAndWrite,&#x2F;&#x2F; 创建并且写时候</span><br><span class="line">OnReadAndWrite&#x2F;&#x2F; 读然后写</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><br>StateVisibility 这个枚举类表示是否可以返回过期的用户值</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">public enum StateVisibility &#123;</span><br><span class="line">ReturnExpiredIfNotCleanedUp,&#x2F;&#x2F; 如果还没有清除，那么返回</span><br><span class="line">NeverReturnExpired&#x2F;&#x2F; 不返回过期的值</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><br>CleanupStrategies 这个抽象类是定义TTL 清除策略的类，里面有个枚举类Strategies ,用于定义清除策略</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">enum Strategies &#123;</span><br><span class="line">FULL_STATE_SCAN_SNAPSHOT,&#x2F;&#x2F; 快照扫描时候</span><br><span class="line">INCREMENTAL_CLEANUP,&#x2F;&#x2F;增量清理</span><br><span class="line">ROCKSDB_COMPACTION_FILTER&#x2F;&#x2F; rocksdb 压缩过滤</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><br>CleanupStrategy接口<br>这是一个空接口，用于定义各个清除策略的实现类，有三个实现类，分别是：<br>EmptyCleanupStrategy：这个类没有任何内容，是个空类<br>IncrementalCleanupStrategy：增量清理策略的配置实现类<br>RocksdbCompactFilterCleanupStrategy：基于rocksdb 压缩过滤策略的配置实现类<br><br><br>StateTtlConfig 的核心属性：</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">private UpdateType updateType &#x3D; OnCreateAndWrite;</span><br><span class="line">private StateVisibility stateVisibility &#x3D; NeverReturnExpired;</span><br><span class="line">private TimeCharacteristic timeCharacteristic &#x3D; ProcessingTime;</span><br><span class="line">private Time ttl;</span><br><span class="line">private CleanupStrategies cleanupStrategies &#x3D; new CleanupStrategies();</span><br></pre></td></tr></table></figure><p>说明：<br>StateTtlConfig 这个类主要有上面5个属性<br>updateType ：代表什么时候更新上次的时间戳<br>stateVisibility ：表示状态过期后是否给用户返回相关值<br>timeCharacteristic ：表示时间特征，现在只支持process time上的状态ttl定义<br>ttl：表示时间频率<br>cleanupStrategies ：代表清除策略,有三种，EmptyCleanupStrategy，IncrementalCleanupStrategy和RocksdbCompactFilterCleanupStrategy</p><!-- rebuild by neat -->]]></content>
    
    <summary type="html">
    
      
      
        &lt;!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --&gt;&lt;p&gt;状态是Flink中的一个重要特征，不论是存储在内存中，还是在rockdb中，但是我们有时候需要对状态进行清查，不需要保存太久的状态，否则这个状态
      
    
    </summary>
    
    
      <category term="大数据" scheme="cpeixin.cn/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/"/>
    
    
      <category term="Flink" scheme="cpeixin.cn/tags/Flink/"/>
    
  </entry>
  
  <entry>
    <title>Flink 故障恢复和重启策略</title>
    <link href="cpeixin.cn/2020/08/08/Flink-%E6%95%85%E9%9A%9C%E6%81%A2%E5%A4%8D%E5%92%8C%E9%87%8D%E5%90%AF%E7%AD%96%E7%95%A5/"/>
    <id>cpeixin.cn/2020/08/08/Flink-%E6%95%85%E9%9A%9C%E6%81%A2%E5%A4%8D%E5%92%8C%E9%87%8D%E5%90%AF%E7%AD%96%E7%95%A5/</id>
    <published>2020-08-08T14:02:55.000Z</published>
    <updated>2020-10-08T14:04:30.988Z</updated>
    
    <content type="html"><![CDATA[<!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --><p>自动故障恢复是 Flink 提供的一个强大的功能，在实际运行环境中，我们会遇到各种各样的问题从而导致应用挂掉，比如我们经常遇到的非法数据、网络抖动等。Flink 提供了强大的可配置故障恢复和重启策略来进行自动恢复。<br><a name="gE8FE"></a></p><h3 id="故障恢复"><a href="#故障恢复" class="headerlink" title="故障恢复"></a>故障恢复</h3><p>Flink 的配置文件，其中有一个参数 jobmanager.execution.failover-strategy: region。<br><br><br>Flink 支持了不同级别的故障恢复策略，jobmanager.execution.failover-strategy 的可配置项有两种：full 和 region。<br><br><br>当我们配置的故障恢复策略为 full 时，集群中的 Task 发生故障，那么该任务的所有 Task 都会发生重启。而在实际生产环境中，我们的大作业可能有几百个 Task，出现一次异常如果进行整个任务重启，那么经常会导致长时间任务不能正常工作，导致数据延迟。<br><br><br>根据图论知识，如果我们的ExecutionGraph是一个非连通图（即可以划分为多个独立的依赖pipeline），那么当某个Task失败时，就可以只回溯到该Task所在的连通分量的Source，并重启该连通分量涉及到的所有Task，而其他Task不受影响，如下图所示。此时一个连通分量就是一个Region。<br><img src="//upload-images.jianshu.io/upload_images/195230-17de2210f9e192ed.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1046/format/webp#align=left&display=inline&height=832&margin=%5Bobject%20Object%5D&originHeight=832&originWidth=1046&status=done&style=none&width=1046" alt><br>这个思路很容易理解，但是对于ExecutionGraph本身就是连通图的情况就不高效了，因为还是要重启所有Task，如下图所示。<br><img src="//upload-images.jianshu.io/upload_images/195230-f90157919c537ee1.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1034/format/webp#align=left&display=inline&height=832&margin=%5Bobject%20Object%5D&originHeight=832&originWidth=1034&status=done&style=none&width=1034" alt><br>所以Flink对这种情况又做了一个优化：在发生一对多依赖的Task后面缓存计算出来的中间结果（intermediate result）。当下游的Task失败重启时，就可以不必回溯到Source，而是回溯到中间结果就行了，重启的Task数进一步减少。此时从中间结果缓存起计的所有下游Task形成一个Region。用语言描述可能有些不直观，一张图就能说明白了。<br><img src="//upload-images.jianshu.io/upload_images/195230-aec4c532733dac52.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1154/format/webp#align=left&display=inline&height=898&margin=%5Bobject%20Object%5D&originHeight=898&originWidth=1154&status=done&style=none&width=1154" alt><br>注意B1、B2后面的黑框框<br>当然，如果是靠近Source一端的Task出了问题，或者中间结果缓存失效，这种方法就行不通了，老老实实从Source重启吧。<br><br><br>所以集群中某一个或几个 Task 发生了故障，只需要重启有问题的一部分即可，这就是 Flink <strong>基于 Region 的局部重启策略</strong>。在这个策略下，Flink 会把我们的任务分成不同的 Region，当某一个 Task 发生故障时，Flink 会计算需要故障恢复的最小 Region。<br><br><br>Flink 在判断需要重启的 Region 时，采用了以下的判断逻辑：</p><ul><li>发生错误的 Task 所在的 Region 需要重启；</li><li>如果当前 Region 的依赖数据出现损坏或者部分丢失，那么生产数据的 Region 也需要重启；</li><li>为了保证数据一致性，当前 Region 的下游 Region 也需要重启。</li></ul><p><br>Job重启策略的相关源码<br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1602165340893-ea0ef5f5-c568-4dc1-8bb1-f10f7c3c9c81.png#align=left&display=inline&height=1626&margin=%5Bobject%20Object%5D&name=%E6%88%AA%E5%B1%8F2020-10-08%20%E4%B8%8B%E5%8D%889.54.50.png&originHeight=1626&originWidth=2156&size=527292&status=done&style=none&width=2156" alt="截屏2020-10-08 下午9.54.50.png"><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1602165346343-68162411-b3d8-48a3-a10b-ac8040abe41d.png#align=left&display=inline&height=1564&margin=%5Bobject%20Object%5D&name=%E6%88%AA%E5%B1%8F2020-10-08%20%E4%B8%8B%E5%8D%889.55.00.png&originHeight=1564&originWidth=2154&size=523720&status=done&style=none&width=2154" alt="截屏2020-10-08 下午9.55.00.png"><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1602165350491-c55081c8-aa73-42fb-be99-5d92a4f4c212.png#align=left&display=inline&height=854&margin=%5Bobject%20Object%5D&name=%E6%88%AA%E5%B1%8F2020-10-08%20%E4%B8%8B%E5%8D%889.55.10.png&originHeight=854&originWidth=1528&size=180780&status=done&style=none&width=1528" alt="截屏2020-10-08 下午9.55.10.png"><br>Task重启策略的相关源码<br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1602165461282-ab9b8935-0fef-4349-b09d-b80c9bcc860f.png#align=left&display=inline&height=1608&margin=%5Bobject%20Object%5D&name=%E6%88%AA%E5%B1%8F2020-10-08%20%E4%B8%8B%E5%8D%889.57.30.png&originHeight=1608&originWidth=2126&size=523231&status=done&style=none&width=2126" alt="截屏2020-10-08 下午9.57.30.png"><br></p><p><a name="fLypk"></a></p><h3 id="重启策略"><a href="#重启策略" class="headerlink" title="重启策略"></a>重启策略</h3><p>Flink 提供了多种类型和级别的重启策略，常用的重启策略包括：</p><ul><li>固定延迟重启策略模式<br></li><li>失败率重启策略模式<br></li><li>无重启策略模式</li></ul><p><br>Flink 在判断使用的哪种重启策略时做了默认约定，如果用户配置了 checkpoint，但没有设置重启策略，<strong>那么会按照固定延迟重启策略模式进行重启</strong>；<strong>如果用户没有配置 checkpoint，那么默认不会重启。</strong><br><br><br>下面我们分别对这三种模式进行详细讲解。<br>**<br><a name="xSnev"></a></p><h4 id="无重启策略模式"><a href="#无重启策略模式" class="headerlink" title="无重启策略模式"></a>无重启策略模式</h4><p>在这种情况下，如果我们的作业发生错误，任务会直接退出。<br>我们可以在 flink-conf.yaml 中配置：</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">restart-strategy: none</span><br></pre></td></tr></table></figure><p><br>也可以在程序中使用代码指定：</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">final ExecutionEnvironment env &#x3D; ExecutionEnvironment.getExecutionEnvironment();</span><br><span class="line">env.setRestartStrategy(RestartStrategies.noRestart());</span><br></pre></td></tr></table></figure><p>**<br><a name="xQIJj"></a></p><h4 id="固定延迟重启策略模式"><a href="#固定延迟重启策略模式" class="headerlink" title="固定延迟重启策略模式"></a>固定延迟重启策略模式</h4><p>固定延迟重启策略会通过在 flink-conf.yaml 中设置如下配置参数，来启用此策略：</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">restart-strategy: fixed-delay</span><br></pre></td></tr></table></figure><p><br>固定延迟重启策略模式需要指定两个参数，首先 Flink 会根据用户配置的重试次数进行重试，每次重试之间根据配置的时间间隔进行重试，如下表所示：<br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1602164492869-016a4628-8225-443b-9794-2bd9159c2864.png#align=left&display=inline&height=312&margin=%5Bobject%20Object%5D&originHeight=312&originWidth=1357&size=0&status=done&style=none&width=1357" alt><br>举个例子，假如我们需要任务重试 3 次，每次重试间隔 5 秒，那么需要进行一下配置：</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">restart-strategy.fixed-delay.attempts: 3</span><br><span class="line">restart-strategy.fixed-delay.delay: 5 s</span><br></pre></td></tr></table></figure><p><br>当前我们也可以在代码中进行设置：</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">env.setRestartStrategy(RestartStrategies.fixedDelayRestart(</span><br><span class="line">        3, &#x2F;&#x2F; 重启次数</span><br><span class="line">        Time.of(5, TimeUnit.SECONDS) &#x2F;&#x2F; 时间间隔</span><br><span class="line">));</span><br></pre></td></tr></table></figure><p><a name="PsLT9"></a></p><h4 id="失败率重启策略模式"><a href="#失败率重启策略模式" class="headerlink" title="失败率重启策略模式"></a>失败率重启策略模式</h4><p>首先我们在 flink-conf.yaml 中指定如下配置：</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">restart-strategy: failure-rate</span><br></pre></td></tr></table></figure><p><br>这种重启模式需要指定三个参数，如下表所示。失败率重启策略在 Job 失败后会重启，但是超过失败率后，Job 会最终被认定失败。在两个连续的重启尝试之间，重启策略会等待一个固定的时间。<br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1602164553310-87ad79da-ca70-429c-8cc4-6e505b68b87f.png#align=left&display=inline&height=407&margin=%5Bobject%20Object%5D&originHeight=407&originWidth=1534&size=0&status=done&style=none&width=1534" alt><br>这种策略的配置理解较为困难，我们举个例子，假如 5 分钟内若失败了 3 次，则认为该任务失败，每次失败的重试间隔为 5 秒。<br><br><br>那么我们的配置应该是：</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">restart-strategy.failure-rate.max-failures-per-interval: 3</span><br><span class="line">restart-strategy.failure-rate.failure-rate-interval: 5 min</span><br><span class="line">restart-strategy.failure-rate.delay: 5 s</span><br></pre></td></tr></table></figure><p><br>当然，也可以在代码中直接指定：</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">env.setRestartStrategy(RestartStrategies.failureRateRestart(</span><br><span class="line">        3, &#x2F;&#x2F; 每个时间间隔的最大故障次数</span><br><span class="line">        Time.of(5, TimeUnit.MINUTES), &#x2F;&#x2F; 测量故障率的时间间隔</span><br><span class="line">        Time.of(5, TimeUnit.SECONDS) &#x2F;&#x2F;  每次任务失败时间间隔</span><br><span class="line">));</span><br></pre></td></tr></table></figure><p><br>最后，需要注意的是，<strong>在实际生产环境中由于每个任务的负载和资源消耗不一样，我们推荐在代码中指定每个任务的重试机制和重启策略</strong>。<br></p><!-- rebuild by neat -->]]></content>
    
    <summary type="html">
    
      
      
        &lt;!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --&gt;&lt;p&gt;自动故障恢复是 Flink 提供的一个强大的功能，在实际运行环境中，我们会遇到各种各样的问题从而导致应用挂掉，比如我们经常遇到的非法数据、网络抖
      
    
    </summary>
    
    
      <category term="大数据" scheme="cpeixin.cn/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/"/>
    
    
      <category term="Flink" scheme="cpeixin.cn/tags/Flink/"/>
    
  </entry>
  
  <entry>
    <title>Redis为什么高效</title>
    <link href="cpeixin.cn/2020/08/01/Redis%E4%B8%BA%E4%BB%80%E4%B9%88%E9%AB%98%E6%95%88/"/>
    <id>cpeixin.cn/2020/08/01/Redis%E4%B8%BA%E4%BB%80%E4%B9%88%E9%AB%98%E6%95%88/</id>
    <published>2020-08-01T08:08:08.000Z</published>
    <updated>2020-08-30T08:11:27.667Z</updated>
    
    <content type="html"><![CDATA[<!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --><p><a name="dYleE"></a></p><h3 id="为什么高效？"><a href="#为什么高效？" class="headerlink" title="为什么高效？"></a>为什么高效？</h3><ol><li>redis是非关系型内存数据库 数据存储于内存中，内存读取速度非常快，如果只是简单的 key-value，内存不是瓶颈。一般情况下，hash 查找可以达到每秒数百万次的数量级。</li></ol><ol start="2"><li>采用单线程，避免了不必要的上下文切换和竞争条件</li></ol><ol start="3"><li>内部实现采用epoll，采用了epoll+自己实现的简单的事件框架。epoll中的读、写、关闭、连接都转化成了事件，然后利用<strong>epoll的多路复用特性</strong>，绝不在io上浪费一点时间</li></ol><ol start="4"><li>因为Redis的操作都非常快速——它的数据全部在内存里，完全不需要访问磁盘。至于并发，Redis 使用多路 I/O 复用技术，本身的并发效率不成问题。</li></ol><p><br>当然，单个 Redis 进程没办法使用多核（任一时刻只能跑在一个 CPU 核心上），但是它本来就不是非常计算密集型的服务。如果单核性能不够用，可以多开几个进程。<br><br><br>Redis采用了单线程的模型，保证了每个操作的原子性，也减少了线程的上下文切换和竞争。<br><br><br>另外，数据结构也帮了不少忙，Redis全程使用hash结构，读取速度快，还有一些特殊的数据结构，对数据存储进行了优化，如压缩表，对短数据进行压缩存储，再如，跳表，使用有序的数据结构加快读取的速度。<br><br><br>还有一点，Redis采用自己实现的事件分离器，效率比较高，内部采用非阻塞的执行方式，吞吐能力比较大。<br></p><p><a name="gAQU5"></a></p><h3 id="Q-amp-A"><a href="#Q-amp-A" class="headerlink" title="Q&amp;A"></a>Q&amp;A</h3><p><strong>Q</strong>: <strong>为什么redis需要把所有数据放到内存中？</strong><br><strong>A</strong>: Redis为了达到最快的读写速度将数据都读到内存中，并通过异步的方式将数据写入磁盘。所以redis具有快速和数据持久化的特征。如果不将数据放在内存中，磁盘I/O速度为严重影响redis的性能。在内存越来越便宜的今天，redis将会越来越受欢迎。如果设置了最大使用的内存，则数据已有记录数达到内存限值后不能继续插入新值。<br><br><br><strong>Q:使用<a href="http://lib.csdn.net/base/redis" target="_blank" rel="external nofollow noopener noreferrer">Redis</a>有哪些好处？</strong><br><strong>A: **<br>(1) 速度快，因为数据存在内存中，类似于HashMap，HashMap的优势就是查找和操作的时间复杂度都是O(1)<br>(2) 支持丰富数据类型，支持string，list，set，sorted set，hash<br>(3) 支持事务，操作都是原子性，所谓的原子性就是对数据的更改要么全部执行，要么全部不执行<br>(4) 丰富的特性：可用于缓存，消息，按key设置过期时间，过期后将会自动删除<br><br><br></strong>Q: 为什么redis使用单线程？<strong><br>A: 官方表示，因为Redis是基于内存的操作，CPU不是Redis的瓶颈，Redis的瓶颈最有可能是机器内存的大小或者网络带宽。既然单线程容易实现，多线程会更复杂，而且CPU不会成为瓶颈，那就顺理成章地采用单线程的方案了<br><br><br></strong>注：这里我们一直在强调的单线程，只是在处理我们的网络请求的时候只有一个线程来处理，一个正式的Redis Server运行的时候肯定是不止一个线程的，这里需要大家明确的注意一下！例如Redis进行持久化的时候会以子进程或者子线程的方式执行（具体是子线程还是子进程待读者深入研究）**</p><!-- rebuild by neat -->]]></content>
    
    <summary type="html">
    
      
      
        &lt;!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --&gt;&lt;p&gt;&lt;a name=&quot;dYleE&quot;&gt;&lt;/a&gt;&lt;/p&gt;&lt;h3 id=&quot;为什么高效？&quot;&gt;&lt;a href=&quot;#为什么高效？&quot; class=&quot;headerl
      
    
    </summary>
    
    
      <category term="大数据" scheme="cpeixin.cn/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/"/>
    
    
      <category term="redis" scheme="cpeixin.cn/tags/redis/"/>
    
  </entry>
  
  <entry>
    <title>Flink 分布式缓存-demo</title>
    <link href="cpeixin.cn/2020/07/30/Flink-%E5%88%86%E5%B8%83%E5%BC%8F%E7%BC%93%E5%AD%98-demo/"/>
    <id>cpeixin.cn/2020/07/30/Flink-%E5%88%86%E5%B8%83%E5%BC%8F%E7%BC%93%E5%AD%98-demo/</id>
    <published>2020-07-30T13:08:09.000Z</published>
    <updated>2020-10-08T13:30:38.459Z</updated>
    
    <content type="html"><![CDATA[<!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --><p><a name="aDwVA"></a></p><h3 id="分布式缓存"><a href="#分布式缓存" class="headerlink" title="分布式缓存"></a>分布式缓存</h3><p>Flink提供了一个分布式缓存，类似于hadoop，可以使用户<strong>在并行函数中很方便的读取本地文件，并把它放在taskmanager节点中，防止task重复拉取。</strong><br><strong><br>此缓存的工作机制如下：程序注册一个文件或者目录(本地或者远程文件系统，例如hdfs或者s3)，通过ExecutionEnvironment注册缓存文件并为它起一个名称。<br><br><br></strong>当程序执行，Flink自动将文件或者目录复制到所有taskmanager节点的本地文件系统，仅会执行一次。**用户可以通过这个指定的名称查找文件或者目录，然后从taskmanager节点的本地文件系统访问它。<br><br><br><br><a name="bbb2A"></a></p><h3 id="Broadcast-广播变量"><a href="#Broadcast-广播变量" class="headerlink" title="Broadcast 广播变量"></a>Broadcast 广播变量</h3><p>一句话解释，可以理解为是一个公共的共享变量，<strong>我们可以把一个dataset 数据集广播出去，然后不同的任务在节点上都能够获取到，这个数据在每个节点上只会存在一份。如果不使用broadcast，则在每个节点中的每个任务中都需要拷贝一份dataset数据集，比较浪费内存(也就是一个节点中可能会存在多份dataset数据)。</strong><br>注意：<br>1：广播出去的变量存在于每个节点的内存中，所以这个数据集不能太大，避免发生OOM。因为广播出去的数据，会常驻内存，除非程序执行结束。<br>2：广播变量在初始化广播出去以后不支持修改，这样才能保证每个节点的数据都是一致的。<br>个人建议：如果数据集在几十兆或者百兆的时候，可以选择进行广播，如果数据集的大小上G的话，就不建议进行广播了。<br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1602159608737-04fb8b6b-3146-4e94-99e5-1b39c4793b89.png#align=left&display=inline&height=527&margin=%5Bobject%20Object%5D&originHeight=527&originWidth=992&size=0&status=done&style=none&width=992" alt><br><a name="2w6yd"></a></p><h3><a href="#" class="headerlink"></a></h3><p><a name="qAmd1"></a></p><h3 id="区别"><a href="#区别" class="headerlink" title="区别"></a>区别</h3><p>1.广播变量是基于内存的,是将变量分发到各个worker节点的内存上（避免多次复制，节省内存）<br>2.分布式缓存是基于磁盘的,将文件copy到各个节点上,当函数运行时可以在本地文件系统检索该文件（避免多次复制，提高执行效率）<br></p><p><a name="aR14Z"></a></p><h3 id="Code"><a href="#Code" class="headerlink" title="Code"></a>Code</h3><p>接下来我们来看一下简单的使用方式</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> org.apache.commons.io.FileUtils;</span><br><span class="line"><span class="keyword">import</span> org.apache.flink.api.common.functions.MapFunction;</span><br><span class="line"><span class="keyword">import</span> org.apache.flink.api.common.functions.RichMapFunction;</span><br><span class="line"><span class="keyword">import</span> org.apache.flink.api.java.DataSet;</span><br><span class="line"><span class="keyword">import</span> org.apache.flink.api.java.ExecutionEnvironment;</span><br><span class="line"><span class="keyword">import</span> org.apache.flink.api.java.operators.DataSource;</span><br><span class="line"><span class="keyword">import</span> org.apache.flink.configuration.Configuration;</span><br><span class="line"><span class="keyword">import</span> java.io.File;</span><br><span class="line"><span class="keyword">import</span> java.util.ArrayList;</span><br><span class="line"><span class="keyword">import</span> java.util.List;</span><br><span class="line"></span><br><span class="line"><span class="comment">/**</span></span><br><span class="line"><span class="comment"> * 分布式缓存</span></span><br><span class="line"><span class="comment"> *  第一步：首先需要在 env 环境中注册一个文件，该文件可以来源于本地，也可以来源于 HDFS ，并且为该文件取一个名字。</span></span><br><span class="line"><span class="comment"> *  第二步：在使用分布式缓存时，可根据注册的名字直接获取。</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">DistributeCache</span> </span>&#123;</span><br><span class="line">    <span class="function"><span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">void</span> <span class="title">main</span><span class="params">(String[] args)</span> <span class="keyword">throws</span> Exception </span>&#123;</span><br><span class="line"></span><br><span class="line">        <span class="keyword">final</span> ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();</span><br><span class="line">        env.registerCachedFile(<span class="string">"/Users/cpeixin/cache/distributedcache.txt"</span>, <span class="string">"distributedCache"</span>);</span><br><span class="line">        <span class="comment">//1：注册一个文件,可以使用hdfs上的文件 也可以是本地文件进行测试</span></span><br><span class="line">        DataSource&lt;String&gt; data = env.fromElements(<span class="string">"Linea"</span>, <span class="string">"Lineb"</span>, <span class="string">"Linec"</span>, <span class="string">"Lined"</span>);</span><br><span class="line"></span><br><span class="line">        <span class="comment">// RichFuction除了提供原来MapFuction的方法之外，还提供open, close, getRuntimeContext 和setRuntimeContext方法，</span></span><br><span class="line">        <span class="comment">// 这些功能可用于参数化函数（传递参数），创建和完成本地状态，访问广播变量以及访问运行时信息以及有关迭代中的信息。</span></span><br><span class="line">        DataSet&lt;String&gt; result = data.map(<span class="keyword">new</span> RichMapFunction&lt;String, String&gt;() &#123;</span><br><span class="line">            <span class="keyword">private</span> ArrayList&lt;String&gt; dataList = <span class="keyword">new</span> ArrayList&lt;String&gt;();</span><br><span class="line"></span><br><span class="line">            <span class="meta">@Override</span></span><br><span class="line">            <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">open</span><span class="params">(Configuration parameters)</span> <span class="keyword">throws</span> Exception </span>&#123;</span><br><span class="line">                <span class="keyword">super</span>.open(parameters);</span><br><span class="line">                <span class="comment">//2：使用该缓存文件</span></span><br><span class="line">                File myFile = getRuntimeContext().getDistributedCache().getFile(<span class="string">"distributedCache"</span>);</span><br><span class="line">                List&lt;String&gt; lines = FileUtils.readLines(myFile);</span><br><span class="line">                <span class="keyword">for</span> (String line : lines) &#123;</span><br><span class="line">                    dataList.add(line);</span><br><span class="line">                    System.err.println(<span class="string">"分布式缓存为:"</span> + line);</span><br><span class="line">                &#125;</span><br><span class="line">            &#125;</span><br><span class="line"></span><br><span class="line">            <span class="meta">@Override</span></span><br><span class="line">            <span class="function"><span class="keyword">public</span> String <span class="title">map</span><span class="params">(String value)</span> <span class="keyword">throws</span> Exception </span>&#123;</span><br><span class="line">                <span class="comment">//在这里就可以使用dataList</span></span><br><span class="line">                System.err.println(<span class="string">"使用datalist："</span> + dataList + <span class="string">"-------"</span> +value);</span><br><span class="line">                <span class="comment">//业务逻辑</span></span><br><span class="line">                <span class="keyword">return</span> dataList +<span class="string">"："</span> +  value;</span><br><span class="line">            &#125;</span><br><span class="line">        &#125;);</span><br><span class="line">        result.printToErr();</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><br>下面给出一个更加贴合生产场景下的需求实现，在流计算中使用DistributedCache来完成我们的计算，整个过程其实有些像与维度数据的join<br></p><figure class="highlight scala"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"> <span class="number">1.</span> <span class="type">Prepare</span> resources on hdfs</span><br><span class="line">[robin<span class="meta">@node</span>01 ~]$ hdfs dfs -mkdir /flink/cache</span><br><span class="line">[robin<span class="meta">@node</span>01 ~]$ vi gender.txt</span><br><span class="line"> <span class="number">1</span>, male</span><br><span class="line"> <span class="number">2</span>, female</span><br><span class="line">[robin<span class="meta">@node</span>01 ~]$ hdfs dfs -put gender.txt /flink/cache</span><br><span class="line"> <span class="number">2.</span> <span class="type">Turn</span> on the source (socket)</span><br></pre></td></tr></table></figure><p>代码</p><figure class="highlight scala"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">package</span> distributeCache</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> java.io.<span class="type">File</span></span><br><span class="line"><span class="keyword">import</span> org.apache.flink.api.common.functions.<span class="type">RichMapFunction</span></span><br><span class="line"><span class="keyword">import</span> org.apache.flink.streaming.api.scala.<span class="type">StreamExecutionEnvironment</span></span><br><span class="line"><span class="keyword">import</span> org.apache.flink.api.scala._</span><br><span class="line"><span class="keyword">import</span> org.apache.flink.configuration.<span class="type">Configuration</span></span><br><span class="line"><span class="keyword">import</span> scala.collection.mutable</span><br><span class="line"><span class="keyword">import</span> scala.io.&#123;<span class="type">BufferedSource</span>, <span class="type">Source</span>&#125;</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">object</span> <span class="title">DistrubutedCacheTest</span> </span>&#123;</span><br><span class="line">  <span class="function"><span class="keyword">def</span> <span class="title">main</span></span>(args: <span class="type">Array</span>[<span class="type">String</span>]): <span class="type">Unit</span> = &#123;</span><br><span class="line">    <span class="comment">//1. Environment</span></span><br><span class="line">    <span class="keyword">val</span> env: <span class="type">StreamExecutionEnvironment</span> = <span class="type">StreamExecutionEnvironment</span>.getExecutionEnvironment</span><br><span class="line"></span><br><span class="line">    <span class="comment">//2. Read the resources on hdfs and set them in the distributed cache</span></span><br><span class="line">    env.registerCachedFile(<span class="string">"hdfs://node01:9000/flink/cache/gender.txt"</span>,<span class="string">"hdfsGenderInfo"</span>)</span><br><span class="line"></span><br><span class="line">    <span class="comment">//3. Read the student information sent by the socket in real time, calculate it, and output the result</span></span><br><span class="line">    <span class="comment">//(101,"jackson",1,"Shanghai")</span></span><br><span class="line">    env.socketTextStream(<span class="string">"node01"</span>,<span class="number">8888</span>)</span><br><span class="line">      .filter((_: <span class="type">String</span>).trim.nonEmpty)</span><br><span class="line">      .map(<span class="keyword">new</span> <span class="type">RichMapFunction</span>[<span class="type">String</span>,(<span class="type">Int</span>,<span class="type">String</span>,<span class="type">Char</span>,<span class="type">String</span>)] &#123;</span><br><span class="line"></span><br><span class="line">        <span class="comment">//Used to store student information read from the distributed cache</span></span><br><span class="line">        <span class="keyword">val</span> map:mutable.<span class="type">Map</span>[<span class="type">Int</span>,<span class="type">Char</span>]= mutable.<span class="type">HashMap</span>()</span><br><span class="line">        <span class="keyword">var</span> bs: <span class="type">BufferedSource</span> = _</span><br><span class="line"></span><br><span class="line">        <span class="keyword">override</span> <span class="function"><span class="keyword">def</span> <span class="title">open</span></span>(parameters: <span class="type">Configuration</span>): <span class="type">Unit</span> = &#123;</span><br><span class="line">          <span class="comment">//1. Read the data stored in the distributed cache</span></span><br><span class="line">          <span class="keyword">var</span> file:<span class="type">File</span> = getRuntimeContext.getDistributedCache.getFile(<span class="string">"hdfsGenderInfo"</span>)</span><br><span class="line"></span><br><span class="line">          <span class="comment">//2. Encapsulate the read information into a map instance for storage</span></span><br><span class="line">          bs = <span class="type">Source</span>.fromFile(file)</span><br><span class="line">          <span class="keyword">val</span> lst: <span class="type">List</span>[<span class="type">String</span>] = bs.getLines().toList</span><br><span class="line">          <span class="keyword">for</span>(perLine &lt;-lst)&#123;</span><br><span class="line">            <span class="keyword">val</span> arr: <span class="type">Array</span>[<span class="type">String</span>] = perLine.split(<span class="string">","</span>)</span><br><span class="line">            <span class="keyword">val</span> genderFlg: <span class="type">Int</span> = arr(<span class="number">0</span>).trim.toInt</span><br><span class="line">            <span class="keyword">val</span> genderName: <span class="type">Char</span> = arr(<span class="number">1</span>).trim.toCharArray()(<span class="number">0</span>)</span><br><span class="line">            map.put(genderFlg,genderName)</span><br><span class="line">          &#125;</span><br><span class="line">        &#125;</span><br><span class="line"></span><br><span class="line">        <span class="keyword">override</span> <span class="function"><span class="keyword">def</span> <span class="title">map</span></span>(perStudentInfo: <span class="type">String</span>): (<span class="type">Int</span>, <span class="type">String</span>, <span class="type">Char</span>, <span class="type">String</span>) = &#123;</span><br><span class="line">          <span class="comment">//Get student details</span></span><br><span class="line">          <span class="keyword">val</span> arr: <span class="type">Array</span>[<span class="type">String</span>] = perStudentInfo.split(<span class="string">","</span>)</span><br><span class="line">          <span class="keyword">val</span> id: <span class="type">Int</span> = arr(<span class="number">0</span>).trim.toInt</span><br><span class="line">          <span class="keyword">val</span> name: <span class="type">String</span> = arr(<span class="number">1</span>).trim</span><br><span class="line">          <span class="keyword">val</span> genderFlg: <span class="type">Int</span> = arr(<span class="number">2</span>).trim.toInt</span><br><span class="line">          <span class="keyword">val</span> address: <span class="type">String</span> = arr(<span class="number">3</span>).trim</span><br><span class="line">          <span class="comment">//According to the data in the distributed cache stored in the container Map, replace the gender identifier in the student information with the real gender</span></span><br><span class="line">          <span class="keyword">var</span> genderName: <span class="type">Char</span> = map.getOrElse(genderFlg, 'x')</span><br><span class="line">          (id, name, genderName, address)</span><br><span class="line">        &#125;</span><br><span class="line"></span><br><span class="line">        <span class="keyword">override</span> <span class="function"><span class="keyword">def</span> <span class="title">close</span></span>(): <span class="type">Unit</span> = &#123;</span><br><span class="line">          <span class="keyword">if</span>(bs != <span class="literal">null</span>)&#123;</span><br><span class="line">            bs.close()</span><br><span class="line">          &#125;</span><br><span class="line">        &#125;</span><br><span class="line">      &#125;).print(<span class="string">"The information completed by the student is -&gt;"</span>)</span><br><span class="line"></span><br><span class="line">    <span class="comment">//4. Start</span></span><br><span class="line">    env.execute(<span class="keyword">this</span>.getClass.getSimpleName)</span><br><span class="line">  &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><br>结果<br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1602161073629-bdd0b192-607e-406a-ab76-d0e8d10904f8.png#align=left&display=inline&height=86&margin=%5Bobject%20Object%5D&name=image.png&originHeight=104&originWidth=904&size=75915&status=done&style=none&width=746" alt="image.png"><br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1602161084713-21827bbb-e936-41da-87d1-337c68685afc.png#align=left&display=inline&height=85&margin=%5Bobject%20Object%5D&name=image.png&originHeight=112&originWidth=980&size=149762&status=done&style=none&width=746" alt="image.png"></p><!-- rebuild by neat -->]]></content>
    
    <summary type="html">
    
      
      
        &lt;!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --&gt;&lt;p&gt;&lt;a name=&quot;aDwVA&quot;&gt;&lt;/a&gt;&lt;/p&gt;&lt;h3 id=&quot;分布式缓存&quot;&gt;&lt;a href=&quot;#分布式缓存&quot; class=&quot;headerlin
      
    
    </summary>
    
    
      <category term="大数据" scheme="cpeixin.cn/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/"/>
    
    
      <category term="Flink" scheme="cpeixin.cn/tags/Flink/"/>
    
  </entry>
  
  <entry>
    <title>HBase RowKey设计</title>
    <link href="cpeixin.cn/2020/07/21/HBase-RowKey%E8%AE%BE%E8%AE%A1/"/>
    <id>cpeixin.cn/2020/07/21/HBase-RowKey%E8%AE%BE%E8%AE%A1/</id>
    <published>2020-07-21T01:42:33.000Z</published>
    <updated>2020-07-21T01:45:26.624Z</updated>
    
    <content type="html"><![CDATA[<!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --><p><a name="kT7dy"></a></p><h3 id="RowKey的作用"><a href="#RowKey的作用" class="headerlink" title="RowKey的作用"></a>RowKey的作用</h3><p><a name="i3ILb"></a></p><h4 id="RowKey在查询中的作用"><a href="#RowKey在查询中的作用" class="headerlink" title="RowKey在查询中的作用"></a>RowKey在查询中的作用</h4><p>HBase中RowKey可以唯一标识一行记录，在HBase中检索数据有以下三种方式：</p><ul><li>通过 <strong>get</strong> 方式，指定 <strong>RowKey</strong> 获取唯一一条记录<br></li><li>通过 <strong>scan</strong> 方式，设置 <strong>startRow</strong> 和 <strong>stopRow</strong> 参数进行范围匹配<br></li><li><strong>全表扫描</strong>，即直接扫描整张表中所有行记录</li></ul><p><br>当大量请求访问HBase集群的一个或少数几个节点，造成少数RegionServer的读写请求过多、负载过大，而其他RegionServer负载却很小，这样就造成<strong>热点现象</strong>。<br><br><br>大量访问会使热点Region所在的主机负载过大，引起性能下降，甚至导致Region不可用。所以我们在向HBase中插入数据的时候，应尽量均衡地把记录分散到不同的Region里去，平衡每个Region的压力。<br><br><br>下面根据一个例子分别介绍下根据RowKey进行查询的时候支持的情况。<br>如果我们RowKey设计为<code>uid</code>+<code>phone</code>+<code>name</code>，那么这种设计可以很好的支持一下的场景:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uid&#x3D;873969725 AND phone&#x3D;18900000000 AND name&#x3D;zhangsanuid&#x3D; 873969725 AND phone&#x3D;18900000000uid&#x3D; 873969725 AND phone&#x3D;189?uid&#x3D; 873969725复制代码</span><br></pre></td></tr></table></figure><p><br>难以支持的场景：</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">phone&#x3D;18900000000 AND name &#x3D; zhangsanphone&#x3D;18900000000 name&#x3D;zhangsan复制代码</span><br></pre></td></tr></table></figure><p><br>从上面的例子中可以看出，在进行查询的时候，根据RowKey从前向后匹配，所以我们在设计RowKey的时候选择好字段之后，还应该结合我们的实际的高频的查询场景来组合选择的字段，越高频的查询字段排列越靠左。<br></p><p><a name="BUcSY"></a></p><h4 id="RowKey在Region中的作用"><a href="#RowKey在Region中的作用" class="headerlink" title="RowKey在Region中的作用"></a>RowKey在Region中的作用</h4><p>在 HBase 中，Region 相当于一个数据的分片，每个 Region 都有<code>StartRowKey</code>和<code>StopRowKey</code>，这是表示 Region 存储的 RowKey 的范围，HBase 表的数据时按照 RowKey 来分散到不同的 Region，要想将数据记录均衡的分散到不同的Region中去，因此需要 RowKey 满足这种散列的特点。<br><br><br>此外，在数据读写过程中也是与RowKey 密切相关，RowKey在读写过程中的作用：</p><ul><li>读写数据时通过 RowKey 找到对应的 Region；<br></li><li>MemStore 中的数据是按照 RowKey 的字典序排序；<br></li><li>HFile 中的数据是按照 RowKey 的字典序排序。<br></li></ul><p><a name="2ZtXF"></a></p><h3 id="RowKey的设计"><a href="#RowKey的设计" class="headerlink" title="RowKey的设计"></a>RowKey的设计</h3><p>在HBase中RowKey在数据检索和数据存储方面都有重要的作用，一个好的RowKey设计会影响到数据在HBase中的分布，还会影响我们查询效率，所以一个好的RowKey的设计方案是多么重要。首先我们先来了解下RowKey的设计原则。<br></p><p><a name="hC4Pc"></a></p><h4 id="RowKey设计原则"><a href="#RowKey设计原则" class="headerlink" title="RowKey设计原则"></a>RowKey设计原则</h4><p><strong>长度原则</strong><br>RowKey是一个二进制码流，可以是任意字符串，最大长度为64kb，实际应用中一般为10-100byte，以byte[]形式保存，一般设计成定长。建议越短越好，不要超过16个字节，原因如下：</p><ul><li>数据的持久化文件HFile中时按照Key-Value存储的，如果RowKey过长，例如超过100byte，那么1000w行的记录，仅RowKey就需占用近1GB的空间。这样会极大影响HFile的存储效率。<br></li><li>MemStore会缓存部分数据到内存中，若RowKey字段过长，内存的有效利用率就会降低，就不能缓存更多的数据，从而降低检索效率。<br></li><li>目前操作系统都是64位系统，内存8字节对齐，控制在16字节，8字节的整数倍利用了操作系统的最佳特性。<br></li></ul><p><strong><br></strong>唯一原则<strong><br>必须在设计上保证RowKey的唯一性。由于在HBase中数据存储是Key-Value形式，若向HBase中同一张表插入相同RowKey的数据，则原先存在的数据会被新的数据覆盖。<br></strong><br><strong>排序原则</strong><br>HBase的RowKey是按照ASCII有序排序的，因此我们在设计RowKey的时候要充分利用这点。<br><strong><br></strong>散列原则**<br>设计的RowKey应均匀的分布在各个HBase节点上。<br></p><p><a name="0wYMP"></a></p><h4 id="RowKey字段选择"><a href="#RowKey字段选择" class="headerlink" title="RowKey字段选择"></a>RowKey字段选择</h4><p>RowKey字段的选择，遵循的<strong>最基本原则是唯一性</strong>，RowKey必须能够唯一的识别一行数据。无论应用的负载特点是什么样，RowKey字段都应该<strong>参考最高频的查询场景</strong>。<br><br><br>数据库通常都是以如何高效的读取和消费数据为目的，而不是数据存储本身。然后，结合具体的负载特点，再对选取的RowKey字段值进行改造，组合字段场景下需要重点考虑字段的顺序。<br><a name="8vTR8"></a></p><h3><a href="#" class="headerlink"></a></h3><p><a name="2bEL3"></a></p><h4 id="避免数据热点的方法"><a href="#避免数据热点的方法" class="headerlink" title="避免数据热点的方法"></a>避免数据热点的方法</h4><p>在对HBase的读写过程中，如何避免热点现象呢？主要有以下几种方法：<br><strong><br></strong>Reversing<strong><br>如果经初步设计出的RowKey在数据分布上不均匀，但RowKey尾部的数据却呈现出了良好的随机性，此时，可以考虑将RowKey的信息翻转，或者直接将尾部的bytes提前到RowKey的开头。Reversing可以有效的使RowKey随机分布，但是牺牲了RowKey的有序性。<br>缺点：利于Get操作，但不利于Scan操作，因为数据在原RowKey上的自然顺序已经被打乱。<br></strong><br><strong>Salting</strong><br>Salting（加盐）的原理是在原RowKey的前面添加固定长度的随机数，也就是给RowKey分配一个随机前缀使它和之间的RowKey的开头不同。随机数能保障数据在所有Regions间的负载均衡。<br>缺点：因为添加的是随机数，基于原RowKey查询时无法知道随机数是什么，那样在查询的时候就需要去各个可能的Regions中查找，Salting对于读取是利空的。并且加盐这种方式增加了读写时的吞吐量。<br><strong><br></strong>Hashing**<br>基于 RowKey 的完整或部分数据进行 Hash，而后将Hashing后的值完整替换或部分替换原RowKey的前缀部分。这里说的 hash 包含 MD5、sha1、sha256 或 sha512 等算法。<br>缺点：与 Reversing 类似，Hashing 也不利于 Scan，因为打乱了原RowKey的自然顺序。<br></p><p><a name="O1jJU"></a></p><h3 id="RowKey设计案例剖析"><a href="#RowKey设计案例剖析" class="headerlink" title="RowKey设计案例剖析"></a>RowKey设计案例剖析</h3><p><strong>1. 查询某用户在某应用中的操作记录</strong></p><blockquote><p>reverse(userid) + appid + timestamp</p></blockquote><p><strong>2. 查询某用户在某应用中的操作记录（优先展现最近的数据）</strong></p><blockquote><p>reverse(userid) + appid + (Long.Max_Value - timestamp)</p></blockquote><p><strong>3. 查询某用户在某段时间内所有应用的操作记录</strong></p><blockquote><p>reverse(userid) + timestamp + appid</p></blockquote><p><strong>4. 查询某用户的基本信息</strong></p><blockquote><p>reverse(userid)</p></blockquote><p><strong>5. 查询某eventid记录信息</strong></p><blockquote><p>salt + eventid + timestamp</p></blockquote><p><br>如果 <code>userid</code>是按数字递增的，并且长度不一，可以先预估 <code>userid</code> 最大长度，然后将<code>userid</code>进行翻转，再在翻转之后的字符串后面补0（至最大长度）；如果长度固定，直接进行翻转即可（如手机号码）。<br>在第5个例子中，加盐的目的是为了增加查询的并发性，加入Slat的范围是0~n，可以将数据分为n个split同时做scan操作，有利于提高查询效率。<br></p><p><a name="QsQPN"></a></p><h3 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h3><p>在HBase的使用过程，设计RowKey是一个很重要的一个环节。我们在进行RowKey设计的时候可参照如下步骤：</p><ul><li>结合业务场景特点，选择合适的字段来做为RowKey，并且按照查询频次来放置字段顺序<br></li><li>通过设计的RowKey能尽可能的将数据打散到整个集群中，均衡负载，避免热点问题<br></li><li>设计的RowKey应尽量简短<br></li></ul><!-- rebuild by neat -->]]></content>
    
    <summary type="html">
    
      
      
        &lt;!-- build time:Tue Jan 12 2021 23:56:17 GMT+0800 (GMT+08:00) --&gt;&lt;p&gt;&lt;a name=&quot;kT7dy&quot;&gt;&lt;/a&gt;&lt;/p&gt;&lt;h3 id=&quot;RowKey的作用&quot;&gt;&lt;a href=&quot;#RowKey的作用&quot; class=&quot;h
      
    
    </summary>
    
    
      <category term="大数据" scheme="cpeixin.cn/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/"/>
    
    
      <category term="HBase" scheme="cpeixin.cn/tags/HBase/"/>
    
  </entry>
  
  <entry>
    <title>数据仓库超详细案例</title>
    <link href="cpeixin.cn/2020/07/07/%E6%95%B0%E6%8D%AE%E4%BB%93%E5%BA%93%E8%B6%85%E8%AF%A6%E7%BB%86%E6%A1%88%E4%BE%8B/"/>
    <id>cpeixin.cn/2020/07/07/%E6%95%B0%E6%8D%AE%E4%BB%93%E5%BA%93%E8%B6%85%E8%AF%A6%E7%BB%86%E6%A1%88%E4%BE%8B/</id>
    <published>2020-07-07T09:17:57.000Z</published>
    <updated>2020-08-03T09:19:05.745Z</updated>
    
    <content type="html"><![CDATA[<!-- build time:Tue Jan 12 2021 23:56:18 GMT+0800 (GMT+08:00) --><p>以下这篇博客转载自<a href="https://www.cnblogs.com/ttzzyy/p/13255841.html" target="_blank" rel="external nofollow noopener noreferrer">数据仓库案例</a>，巨详细👍👍👍👍👍<br><a name="9oBBW"></a></p><h1 id="离线数据仓库"><a href="#离线数据仓库" class="headerlink" title="离线数据仓库"></a>离线数据仓库</h1><blockquote><p>数据仓库（Data WareHouse）是为企业所有决策制定过程，提供所有系统数据支持的战略集合<br>通过对数据仓库中数据的分析，可以帮助企业，改进业务流程、控制、成本、提高产品质量等<br>数据仓库，并不是数据最终目的地，而是为数据最终的目的地做好准备：清洗、转义、分类、重组、合并、拆分、统计等等</p></blockquote><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445205-31fdcfc5-e2d2-41a6-b5cc-624c31b3e574.png#align=left&display=inline&height=744&margin=%5Bobject%20Object%5D&originHeight=744&originWidth=1234&size=0&status=done&style=none&width=1234" alt><br><a name="blogTitle0"></a></p><h2 id="1-项目简介"><a href="#1-项目简介" class="headerlink" title="1 项目简介"></a>1 项目简介</h2><p><a name="blogTitle1"></a></p><h3 id="1-1-项目需求"><a href="#1-1-项目需求" class="headerlink" title="1.1 项目需求"></a>1.1 项目需求</h3><ol><li>用户行为数据采集平台搭建</li><li>业务数据采集平台搭建</li><li>数据仓库维度建模</li><li>分析：用户、流量、会员、商品、销售、地区、活动等主题</li><li>采用即席查询工具，随时进行指标分析</li><li>对集群性能进行监控，发生异常需要报警</li><li>元数据管理</li><li>质量监控<br><a name="blogTitle2"></a><h3 id="1-2-技术选型"><a href="#1-2-技术选型" class="headerlink" title="1.2 技术选型"></a>1.2 技术选型</h3></li></ol><ul><li><p>数据采集功能如何技术选型</p><table><thead><tr><th align="center">采集框架名称</th><th>主要功能</th></tr></thead><tbody><tr><td align="center">Sqoop</td><td>大数据平台和关系型数据库的导入导出</td></tr><tr><td align="center">Datax</td><td>大数据平台和关系型数据库的导入导出</td></tr><tr><td align="center">flume</td><td>擅长日志数据的采集和解析</td></tr><tr><td align="center">logstash</td><td>擅长日志数据的采集和解析</td></tr><tr><td align="center">maxwell</td><td>常用作实时解析mysql的binlog数据</td></tr><tr><td align="center">canal</td><td>常用作实时解析mysql的binlog数据</td></tr><tr><td align="center">waterDrop</td><td>数据导入导出工具</td></tr></tbody></table></li><li><p>消息中间件的技术选型</p><table><thead><tr><th align="center"><strong>开源MQ</strong></th><th><strong>概述</strong></th></tr></thead><tbody><tr><td align="center">RabbitMQ</td><td>LShift 用Erlang实现，支持多协议，broker架构，重量级</td></tr><tr><td align="center">ZeroMQ</td><td>AMQP最初设计者iMatix公司实现，轻量消息内核，无broker设计。C++实现</td></tr><tr><td align="center">Kafka</td><td>LinkedIn用Scala语言实现，支持hadoop数据并行加载</td></tr><tr><td align="center">ActiveMQ</td><td>Apach的一种JMS具体实现，支持代理和p2p部署。支持多协议。Java实现</td></tr><tr><td align="center">Redis</td><td>Key-value NoSQL数据库，有MQ的功能</td></tr><tr><td align="center">MemcacheQ</td><td>国人利用memcache缓冲队列协议开发的消息队列,C/C++实现</td></tr></tbody></table></li><li><p>数据永久存储技术框架选型</p><table><thead><tr><th align="center">框架名称</th><th>主要用途</th></tr></thead><tbody><tr><td align="center">HDFS</td><td>分布式文件存储系统</td></tr><tr><td align="center">Hbase</td><td>Key，value对的nosql数据库</td></tr><tr><td align="center">Kudu</td><td>Cloudera公司开源提供的类似于Hbase的数据存储</td></tr><tr><td align="center">Hive</td><td>基于MR的数据仓库工具</td></tr></tbody></table></li><li><p>数据离线计算框架技术选型(hive引擎)</p><table><thead><tr><th align="center">框架名称</th><th>基本介绍</th></tr></thead><tbody><tr><td align="center">MapReduce</td><td>最早期的分布式文件计算系统</td></tr><tr><td align="center">Spark</td><td>基于spark，一站式解决批流处理问题</td></tr><tr><td align="center">Flink</td><td>基于flink，一站式解决批流处理问题</td></tr></tbody></table></li><li><p>分析数据库选型</p><table><thead><tr><th>对比项目</th><th align="center">Druid</th><th align="center">Kylin</th><th align="center">Presto</th><th align="center">Impala</th><th align="center">ES</th></tr></thead><tbody><tr><td>亚秒级响应</td><td align="center">√</td><td align="center">√</td><td align="center">×</td><td align="center">×</td><td align="center">×</td></tr><tr><td>百亿数据集</td><td align="center">√</td><td align="center">√</td><td align="center">√</td><td align="center">√</td><td align="center">√</td></tr><tr><td>SQL支持</td><td align="center">√</td><td align="center">√</td><td align="center">√</td><td align="center">√</td><td align="center">√(需插件)</td></tr><tr><td>离线</td><td align="center">√</td><td align="center">√</td><td align="center">√</td><td align="center">√</td><td align="center">√</td></tr><tr><td>实时</td><td align="center">√</td><td align="center">√</td><td align="center">×</td><td align="center">×</td><td align="center">×</td></tr><tr><td>精确去重</td><td align="center">×</td><td align="center">√</td><td align="center">√</td><td align="center">√</td><td align="center">×</td></tr><tr><td>多表Join</td><td align="center">×</td><td align="center">√</td><td align="center">√</td><td align="center">√</td><td align="center">×</td></tr><tr><td>JDBC for BI</td><td align="center">×</td><td align="center">√</td><td align="center">√</td><td align="center">√</td><td align="center">×</td></tr></tbody></table></li><li><p>其他选型</p><ul><li>任务调度：DolphinScheduler</li><li>集群监控：CM+CDH</li><li>元数据管理：Atlas</li><li>BI工具：Zeppelin、Superset<br><a name="blogTitle3"></a><h3 id="1-3-架构"><a href="#1-3-架构" class="headerlink" title="1.3 架构"></a>1.3 架构</h3><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445253-8d922f5c-47b5-4f1d-ad8e-1a71eeae15aa.png#align=left&display=inline&height=736&margin=%5Bobject%20Object%5D&originHeight=736&originWidth=1703&size=0&status=done&style=none&width=1703" alt><br><a name="blogTitle4"></a><h3 id="1-4-集群资源规划"><a href="#1-4-集群资源规划" class="headerlink" title="1.4 集群资源规划"></a>1.4 集群资源规划</h3></li></ul></li><li><p>如何确认集群规模（假设每台服务器8T磁盘，128G内存）</p><ol><li>每天日活跃用户100万，每人一天平均100条：100万 * 100条 = 1亿条</li><li>每条日志1K左右，每天1一条：1亿 / 1024 /1024 = 约100G</li><li>半年内不扩容服务器来算：100G * 180天 = 约18T</li><li>保存3个副本：18T * 3 = 54T</li><li>预留20% ~ 30%BUF：54T / 0.7 = 77T</li><li>总结：约10台服务器<blockquote><p>由于资源有限，采用3台进行制作</p></blockquote></li></ol></li></ul><table><thead><tr><th>服务名称</th><th>子服务</th><th>服务器 cdh01.cm</th><th>服务器 cdh02.cm</th><th>服务器 cdh03.cm</th></tr></thead><tbody><tr><td>HDFS</td><td>NameNode<br>DataNode<br>SecondaryNameNode</td><td>√<br>√</td><td>√</td><td>√<br>√</td></tr><tr><td>Yarn</td><td>NodeManager<br>Resourcemanager</td><td>√</td><td>√<br>√</td><td>√</td></tr><tr><td>Zookeeper</td><td>Zookeeper Server</td><td>√</td><td>√</td><td>√</td></tr><tr><td>Flume</td><td>Flume<br>Flume（消费 Kafka）</td><td>√</td><td>√</td><td>√</td></tr><tr><td>Kafka</td><td>Kafka</td><td>√</td><td>√</td><td>√</td></tr><tr><td>Hive</td><td>Hive</td><td>√</td><td></td><td></td></tr><tr><td>MySQL</td><td>MySQL</td><td>√</td><td></td><td></td></tr><tr><td>Sqoop</td><td>Sqoop</td><td>√</td><td></td><td></td></tr><tr><td>Presto</td><td>Coordinator<br>Worker</td><td>√</td><td>√</td><td>√</td></tr><tr><td>DolphinScheduler</td><td>DolphinScheduler</td><td>√</td><td></td><td></td></tr><tr><td>Druid</td><td>Druid</td><td>√</td><td>√</td><td>√</td></tr><tr><td>Kylin</td><td>Kylin</td><td>√</td><td></td><td></td></tr><tr><td>Hbase</td><td>HMaster<br>HRegionServer</td><td>√<br>√</td><td>√</td><td>√</td></tr><tr><td>Superset</td><td>Superset</td><td>√</td><td></td><td></td></tr><tr><td>Atlas</td><td>Atlas</td><td>√</td><td></td><td></td></tr><tr><td>Solr</td><td>Solr</td><td>√</td><td></td><td></td></tr></tbody></table><p><a name="blogTitle5"></a></p><h2 id="2-数据生成模块"><a href="#2-数据生成模块" class="headerlink" title="2 数据生成模块"></a>2 数据生成模块</h2><blockquote><p>此模块主要针对于用户行为数据的采集，为什么要进行用户行为数据的采集呢？<br>因为对于企业来说，用户就是钱，需要将用户的习惯等数据进行采集，以便在大数据衍生产品如用户画像标签系统进行分析，那么一般情况下用户的信息都是离线分析的，后期我们可以将分析结果存入ES等倒排索引生态中，在使用实时计算的方式匹配用户习惯，进行定制化推荐，更进一步的深度学习，对相似用户进行推荐。</p></blockquote><p><a name="blogTitle6"></a></p><h3 id="2-1-埋点数据基本格式"><a href="#2-1-埋点数据基本格式" class="headerlink" title="2.1 埋点数据基本格式"></a>2.1 埋点数据基本格式</h3><ul><li><p>公共字段：基本所有安卓手机都包含的字段<br></p></li><li><p>业务字段：埋点上报的字段，有具体的业务类型<br></p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">&quot;ap&quot;:&quot;xxxxx&quot;,&#x2F;&#x2F;项目数据来源 app pc</span><br><span class="line">&quot;cm&quot;: &#123; &#x2F;&#x2F;公共字段</span><br><span class="line">&quot;mid&quot;: &quot;&quot;, &#x2F;&#x2F; (String) 设备唯一标识</span><br><span class="line">&quot;uid&quot;: &quot;&quot;, &#x2F;&#x2F; (String) 用户标识</span><br><span class="line">&quot;vc&quot;: &quot;1&quot;, &#x2F;&#x2F; (String) versionCode，程序版本号</span><br><span class="line">&quot;vn&quot;: &quot;1.0&quot;, &#x2F;&#x2F; (String) versionName，程序版本名</span><br><span class="line">&quot;l&quot;: &quot;zh&quot;, &#x2F;&#x2F; (String) language 系统语言</span><br><span class="line">&quot;sr&quot;: &quot;&quot;, &#x2F;&#x2F; (String) 渠道号，应用从哪个渠道来的。</span><br><span class="line">&quot;os&quot;: &quot;7.1.1&quot;, &#x2F;&#x2F; (String) Android 系统版本</span><br><span class="line">&quot;ar&quot;: &quot;CN&quot;, &#x2F;&#x2F; (String) area 区域</span><br><span class="line">&quot;md&quot;: &quot;BBB100-1&quot;, &#x2F;&#x2F; (String) model 手机型号</span><br><span class="line">&quot;ba&quot;: &quot;blackberry&quot;, &#x2F;&#x2F; (String) brand 手机品牌</span><br><span class="line">&quot;sv&quot;: &quot;V2.2.1&quot;, &#x2F;&#x2F; (String) sdkVersion</span><br><span class="line">&quot;g&quot;: &quot;&quot;, &#x2F;&#x2F; (String) gmail</span><br><span class="line">&quot;hw&quot;: &quot;1620x1080&quot;, &#x2F;&#x2F; (String) heightXwidth，屏幕宽高</span><br><span class="line">&quot;t&quot;: &quot;1506047606608&quot;, &#x2F;&#x2F; (String) 客户端日志产生时的时间</span><br><span class="line">&quot;nw&quot;: &quot;WIFI&quot;, &#x2F;&#x2F; (String) 网络模式</span><br><span class="line">&quot;ln&quot;: 0, &#x2F;&#x2F; (double) lng 经度</span><br><span class="line">&quot;la&quot;: 0 &#x2F;&#x2F; (double) lat 纬度</span><br><span class="line">&#125;,</span><br><span class="line">&quot;et&quot;: [ &#x2F;&#x2F;事件</span><br><span class="line">&#123;</span><br><span class="line">&quot;ett&quot;: &quot;1506047605364&quot;, &#x2F;&#x2F;客户端事件产生时间</span><br><span class="line">&quot;en&quot;: &quot;display&quot;, &#x2F;&#x2F;事件名称</span><br><span class="line">&quot;kv&quot;: &#123; &#x2F;&#x2F;事件结果，以 key-value 形式自行定义</span><br><span class="line">&quot;goodsid&quot;: &quot;236&quot;,</span><br><span class="line">&quot;action&quot;: &quot;1&quot;,</span><br><span class="line">&quot;extend1&quot;: &quot;1&quot;,</span><br><span class="line">&quot;place&quot;: &quot;2&quot;,</span><br><span class="line">&quot;category&quot;: &quot;75&quot;</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line">]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></li><li><p>示例日志（服务器时间戳 | 日志），时间戳可以有效判定网络服务的通信时长：</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br></pre></td><td class="code"><pre><span class="line">1540934156385| &#123;</span><br><span class="line">&quot;ap&quot;: &quot;gmall&quot;,&#x2F;&#x2F;数仓库名</span><br><span class="line">&quot;cm&quot;: &#123;</span><br><span class="line">&quot;uid&quot;: &quot;1234&quot;,</span><br><span class="line">&quot;vc&quot;: &quot;2&quot;,</span><br><span class="line">&quot;vn&quot;: &quot;1.0&quot;,</span><br><span class="line">&quot;la&quot;: &quot;EN&quot;,</span><br><span class="line">&quot;sr&quot;: &quot;&quot;,</span><br><span class="line">&quot;os&quot;: &quot;7.1.1&quot;,</span><br><span class="line">&quot;ar&quot;: &quot;CN&quot;,</span><br><span class="line">&quot;md&quot;: &quot;BBB100-1&quot;,</span><br><span class="line">&quot;ba&quot;: &quot;blackberry&quot;,</span><br><span class="line">&quot;sv&quot;: &quot;V2.2.1&quot;,</span><br><span class="line">&quot;g&quot;: &quot;abc@gmail.com&quot;,</span><br><span class="line">&quot;hw&quot;: &quot;1620x1080&quot;,</span><br><span class="line">&quot;t&quot;: &quot;1506047606608&quot;,</span><br><span class="line">&quot;nw&quot;: &quot;WIFI&quot;,</span><br><span class="line">&quot;ln&quot;: 0,</span><br><span class="line">        &quot;la&quot;: 0</span><br><span class="line">&#125;,</span><br><span class="line">&quot;et&quot;: [</span><br><span class="line">&#123;</span><br><span class="line">&quot;ett&quot;: &quot;1506047605364&quot;, &#x2F;&#x2F;客户端事件产生时间</span><br><span class="line">&quot;en&quot;: &quot;display&quot;, &#x2F;&#x2F;事件名称</span><br><span class="line">&quot;kv&quot;: &#123; &#x2F;&#x2F;事件结果，以 key-value 形式自行定义</span><br><span class="line">&quot;goodsid&quot;: &quot;236&quot;,</span><br><span class="line">&quot;action&quot;: &quot;1&quot;,</span><br><span class="line">&quot;extend1&quot;: &quot;1&quot;,</span><br><span class="line">&quot;place&quot;: &quot;2&quot;,</span><br><span class="line">&quot;category&quot;: &quot;75&quot;</span><br><span class="line">&#125;</span><br><span class="line">&#125;,&#123;</span><br><span class="line">&quot;ett&quot;: &quot;1552352626835&quot;,</span><br><span class="line">&quot;en&quot;: &quot;active_background&quot;,</span><br><span class="line">&quot;kv&quot;: &#123;</span><br><span class="line">&quot;active_source&quot;: &quot;1&quot;</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line">]</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><a name="blogTitle7"></a></p><h3 id="2-2-埋点事件日志数据"><a href="#2-2-埋点事件日志数据" class="headerlink" title="2.2 埋点事件日志数据"></a>2.2 埋点事件日志数据</h3><p><a name="bfe09d0c"></a></p><h4 id="2-2-1-商品列表页"><a href="#2-2-1-商品列表页" class="headerlink" title="2.2.1 商品列表页"></a>2.2.1 商品列表页</h4><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445220-53833589-bf8f-44b6-8c7b-465e87e1e711.png#align=left&display=inline&height=836&margin=%5Bobject%20Object%5D&originHeight=836&originWidth=482&size=0&status=done&style=none&width=482" alt></p></li><li><p>事件名称：loading</p><table><thead><tr><th>标签</th><th>含义</th></tr></thead><tbody><tr><td>action</td><td>动作：开始加载=1，加载成功=2，加载失败=3</td></tr><tr><td>loading_time</td><td>加载时长：计算下拉开始到接口返回数据的时间，（开始加载报 0，加载成 功或加载失败才上报时间）</td></tr><tr><td>loading_way</td><td>加载类型：1-读取缓存，2-从接口拉新数据 （加载成功才上报加载类型）</td></tr><tr><td>extend1</td><td>扩展字段 Extend1</td></tr><tr><td>extend2</td><td>扩展字段 Extend2</td></tr><tr><td>type</td><td>加载类型：自动加载=1，用户下拽加载=2，底部加载=3（底部条触发点击底部提示条/点击返回顶部加载）</td></tr><tr><td>type1</td><td>加载失败码：把加载失败状态码报回来（报空为加载成功，没有失败）</td></tr></tbody></table></li></ul><p><a name="ee4b4b36"></a></p><h4 id="2-2-2-商品点击"><a href="#2-2-2-商品点击" class="headerlink" title="2.2.2 商品点击"></a>2.2.2 商品点击</h4><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445213-0bbd34e0-bb23-4e03-b90e-3ca09d0c5569.png#align=left&display=inline&height=838&margin=%5Bobject%20Object%5D&originHeight=838&originWidth=499&size=0&status=done&style=none&width=499" alt></p><ul><li>事件标签：display<table><thead><tr><th>标签</th><th>含义</th></tr></thead><tbody><tr><td>action</td><td>动作：曝光商品=1，点击商品=2</td></tr><tr><td>goodsid</td><td>商品 ID（服务端下发的 ID）</td></tr><tr><td>place</td><td>顺序（第几条商品，第一条为 0，第二条为 1，如此类推）</td></tr><tr><td>extend1</td><td>曝光类型：1 - 首次曝光 2-重复曝光</td></tr><tr><td>category</td><td>分类 ID（服务端定义的分类 ID）</td></tr></tbody></table></li></ul><p><a name="16ea94b9"></a></p><h4 id="2-2-3-商品详情页"><a href="#2-2-3-商品详情页" class="headerlink" title="2.2.3 商品详情页"></a>2.2.3 商品详情页</h4><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445268-6588114c-b79a-4e79-8763-250f838f3536.png#align=left&display=inline&height=711&margin=%5Bobject%20Object%5D&originHeight=711&originWidth=493&size=0&status=done&style=none&width=493" alt></p><ul><li>事件标签：newsdetail<table><thead><tr><th>标签</th><th>含义</th></tr></thead><tbody><tr><td>entry</td><td>页面入口来源：应用首页=1、push=2、详情页相关推荐=3</td></tr><tr><td>action</td><td>动作：开始加载=1，加载成功=2（pv），加载失败=3, 退出页面=4</td></tr><tr><td>goodsid</td><td>商品 ID（服务端下发的 ID）</td></tr><tr><td>show_style</td><td>商品样式：0、无图、1、一张大图、2、两张图、3、三张小图、4、一张小图、 5、一张大图两张小图</td></tr><tr><td>news_staytime</td><td>页面停留时长：从商品开始加载时开始计算，到用户关闭页面所用的时间。 若中途用跳转到其它页面了，则暂停计时，待回到详情页时恢复计时。或中 途划出的时间超过 10 分钟，则本次计时作废，不上报本次数据。如未加载成 功退出，则报空。</td></tr><tr><td>loading_time</td><td>加载时长：计算页面开始加载到接口返回数据的时间 （开始加载报 0，加载 成功或加载失败才上报时间）</td></tr><tr><td>type1</td><td>加载失败码：把加载失败状态码报回来（报空为加载成功，没有失败）</td></tr><tr><td>category</td><td>分类 ID（服务端定义的分类 ID）</td></tr></tbody></table></li></ul><p><a name="571b8d48"></a></p><h4 id="2-2-4-广告"><a href="#2-2-4-广告" class="headerlink" title="2.2.4 广告"></a>2.2.4 广告</h4><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445229-1c9049ba-ee68-4f5c-ad71-bba666fde424.png#align=left&display=inline&height=858&margin=%5Bobject%20Object%5D&originHeight=858&originWidth=513&size=0&status=done&style=none&width=513" alt></p><ul><li>事件名称：ad<table><thead><tr><th>标签</th><th>含义</th></tr></thead><tbody><tr><td>entry</td><td>入口：商品列表页=1 应用首页=2 商品详情页=3</td></tr><tr><td>action</td><td>动作： 广告展示=1 广告点击=2</td></tr><tr><td>contentType</td><td>Type: 1 商品 2 营销活动</td></tr><tr><td>displayMills</td><td>展示时长 毫秒数</td></tr><tr><td>itemId</td><td>商品 id</td></tr><tr><td>activityId</td><td>营销活动 id</td></tr></tbody></table></li></ul><p><a name="23c440b4"></a></p><h4 id="2-2-5-消息通知"><a href="#2-2-5-消息通知" class="headerlink" title="2.2.5 消息通知"></a>2.2.5 消息通知</h4><ul><li>事件标签：notification<table><thead><tr><th>标签</th><th>含义</th></tr></thead><tbody><tr><td>action</td><td>动作：通知产生=1，通知弹出=2，通知点击=3，常驻通知展示（不重复上 报，一天之内只报一次）=4</td></tr><tr><td>type</td><td>通知 id：预警通知=1，天气预报（早=2，晚=3），常驻=4</td></tr><tr><td>ap_time</td><td>客户端弹出时间</td></tr><tr><td>content</td><td>备用字段</td></tr></tbody></table></li></ul><p><a name="409b258a"></a></p><h4 id="2-2-6-用户后台活跃"><a href="#2-2-6-用户后台活跃" class="headerlink" title="2.2.6 用户后台活跃"></a>2.2.6 用户后台活跃</h4><ul><li>事件标签: active_background<table><thead><tr><th>标签</th><th>含义</th></tr></thead><tbody><tr><td>active_source</td><td>1=upgrade,2=download(下载),3=plugin_upgrade</td></tr></tbody></table></li></ul><p><a name="d3d9a6ee"></a></p><h4 id="2-2-7-评论"><a href="#2-2-7-评论" class="headerlink" title="2.2.7 评论"></a>2.2.7 评论</h4><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445234-c1ea35e6-d731-485d-9f74-e1eccba7431e.png#align=left&display=inline&height=894&margin=%5Bobject%20Object%5D&originHeight=894&originWidth=537&size=0&status=done&style=none&width=537" alt></p><ul><li>描述：评论表（comment）<table><thead><tr><th align="center">序号</th><th>字段名称</th><th>字段描述</th><th align="center">字段类型</th><th align="center">长度</th><th align="center">允许空</th><th align="center">缺省值</th></tr></thead><tbody><tr><td align="center">1</td><td>comment_id</td><td>评论表</td><td align="center">int</td><td align="center">10,0</td><td align="center"></td><td align="center"></td></tr><tr><td align="center">2</td><td>userid</td><td>用户 id</td><td align="center">int</td><td align="center">10,0</td><td align="center">√</td><td align="center">0</td></tr><tr><td align="center">3</td><td>p_comment_id</td><td>父级评论 id(为 0 则是<br>一级评论,不 为 0 则是回复)</td><td align="center">int</td><td align="center">10,0</td><td align="center">√</td><td align="center"></td></tr><tr><td align="center">4</td><td>content</td><td>评论内容</td><td align="center">string</td><td align="center">1000</td><td align="center">√</td><td align="center"></td></tr><tr><td align="center">5</td><td>addtime</td><td>创建时间</td><td align="center">string</td><td align="center"></td><td align="center">√</td><td align="center"></td></tr><tr><td align="center">6</td><td>other_id</td><td>评论的相关 id</td><td align="center">int</td><td align="center">10,0</td><td align="center">√</td><td align="center"></td></tr><tr><td align="center">7</td><td>praise_count</td><td>点赞数量</td><td align="center">int</td><td align="center">10,0</td><td align="center">√</td><td align="center">0</td></tr><tr><td align="center">8</td><td>reply_count</td><td>回复数量</td><td align="center">int</td><td align="center">10,0</td><td align="center">√</td><td align="center">0</td></tr></tbody></table></li></ul><p><a name="aa37918b"></a></p><h4 id="2-2-8-收藏"><a href="#2-2-8-收藏" class="headerlink" title="2.2.8 收藏"></a>2.2.8 收藏</h4><ul><li>描述：收藏（favorites）<table><thead><tr><th align="center">序号</th><th>字段名称</th><th>字段描述</th><th align="center">字段类型</th><th align="center">长度</th><th align="center">允许空</th><th align="center">缺省值</th></tr></thead><tbody><tr><td align="center">1</td><td>id</td><td>主键</td><td align="center">int</td><td align="center">10,0</td><td align="center"></td><td align="center"></td></tr><tr><td align="center">2</td><td>course_id</td><td>商品 id</td><td align="center">int</td><td align="center">10,0</td><td align="center">√</td><td align="center">0</td></tr><tr><td align="center">3</td><td>userid</td><td>用户 ID</td><td align="center">int</td><td align="center">10,0</td><td align="center">√</td><td align="center">0</td></tr><tr><td align="center">4</td><td>add_time</td><td>创建时间</td><td align="center">string</td><td align="center"></td><td align="center">√</td><td align="center"></td></tr></tbody></table></li></ul><p><a name="2407bbc4"></a></p><h4 id="2-2-9-点赞"><a href="#2-2-9-点赞" class="headerlink" title="2.2.9 点赞"></a>2.2.9 点赞</h4><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445204-a4518660-4192-4dc1-b934-848b40a364d9.png#align=left&display=inline&height=772&margin=%5Bobject%20Object%5D&originHeight=772&originWidth=493&size=0&status=done&style=none&width=493" alt></p><ul><li>描述：所有的点赞表（praise）<table><thead><tr><th align="center">序号</th><th>字段名称</th><th>字段描述</th><th align="center">字段类型</th><th align="center">长度</th><th align="center">允许空</th><th align="center">缺省值</th></tr></thead><tbody><tr><td align="center">1</td><td>id</td><td>主键 id</td><td align="center">int</td><td align="center">10,0</td><td align="center"></td><td align="center"></td></tr><tr><td align="center">2</td><td>userid</td><td>用户 id</td><td align="center">int</td><td align="center">10,0</td><td align="center">√</td><td align="center"></td></tr><tr><td align="center">3</td><td>target_id</td><td>点赞的对象 id</td><td align="center">int</td><td align="center">10,0</td><td align="center">√</td><td align="center"></td></tr><tr><td align="center">4</td><td>type</td><td>创建点赞类型：1问答点赞 2问答评论点赞<br>3文章点赞数 4评论点赞</td><td align="center">int</td><td align="center">10,0</td><td align="center">√</td><td align="center"></td></tr><tr><td align="center">5</td><td>add_time</td><td>添加时间</td><td align="center">string</td><td align="center"></td><td align="center">√</td><td align="center"></td></tr></tbody></table></li></ul><p><a name="f4f90cd0"></a></p><h4 id="2-2-10-错误日志"><a href="#2-2-10-错误日志" class="headerlink" title="2.2.10 错误日志"></a>2.2.10 错误日志</h4><table><thead><tr><th align="center"><strong>errorBrief</strong></th><th align="center"><strong>错误摘要</strong></th></tr></thead><tbody><tr><td align="center"><strong>errorBrief</strong></td><td align="center"><strong>错误详情</strong></td></tr></tbody></table><p><a name="blogTitle8"></a></p><h3 id="2-3-埋点启动日志数据"><a href="#2-3-埋点启动日志数据" class="headerlink" title="2.3 埋点启动日志数据"></a>2.3 埋点启动日志数据</h3><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">&quot;action&quot;:&quot;1&quot;,</span><br><span class="line">&quot;ar&quot;:&quot;MX&quot;,</span><br><span class="line">&quot;ba&quot;:&quot;HTC&quot;,</span><br><span class="line">&quot;detail&quot;:&quot;&quot;,</span><br><span class="line">&quot;en&quot;:&quot;start&quot;,</span><br><span class="line">&quot;entry&quot;:&quot;2&quot;,</span><br><span class="line">&quot;extend1&quot;:&quot;&quot;,</span><br><span class="line">&quot;g&quot;:&quot;43R2SEQX@gmail.com&quot;,</span><br><span class="line">&quot;hw&quot;:&quot;640*960&quot;,</span><br><span class="line">&quot;l&quot;:&quot;en&quot;,</span><br><span class="line">&quot;la&quot;:&quot;20.4&quot;,</span><br><span class="line">&quot;ln&quot;:&quot;-99.3&quot;,</span><br><span class="line">&quot;loading_time&quot;:&quot;2&quot;,</span><br><span class="line">&quot;md&quot;:&quot;HTC-2&quot;,</span><br><span class="line">&quot;mid&quot;:&quot;995&quot;,</span><br><span class="line">&quot;nw&quot;:&quot;4G&quot;,</span><br><span class="line">&quot;open_ad_type&quot;:&quot;2&quot;,</span><br><span class="line">&quot;os&quot;:&quot;8.1.2&quot;,</span><br><span class="line">&quot;sr&quot;:&quot;B&quot;,</span><br><span class="line">&quot;sv&quot;:&quot;V2.0.6&quot;,</span><br><span class="line">&quot;t&quot;:&quot;1561472502444&quot;,</span><br><span class="line">&quot;uid&quot;:&quot;995&quot;,</span><br><span class="line">&quot;vc&quot;:&quot;10&quot;,</span><br><span class="line">&quot;vn&quot;:&quot;1.3.4&quot;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><ul><li>事件标签: start<table><thead><tr><th>标签</th><th>含义</th></tr></thead><tbody><tr><td>entry</td><td>入 口 ： push=1 ， widget=2 ， icon=3 ， notification=4, lockscreen_widget =5</td></tr><tr><td>open_ad_type</td><td>开屏广告类型: 开屏原生广告=1, 开屏插屏广告=2</td></tr><tr><td>action</td><td>状态：成功=1 失败=2</td></tr><tr><td>loading_time</td><td>加载时长：计算下拉开始到接口返回数据的时间，（开始加载报 0，加载成 功或加载失败才上报时间）</td></tr><tr><td>detail</td><td>失败码（没有则上报空）</td></tr><tr><td>extend1</td><td>失败的 message（没有则上报空）</td></tr><tr><td>en</td><td>日志类型 start</td></tr></tbody></table></li></ul><p><a name="blogTitle9"></a></p><h3 id="2-4-数据生成脚本"><a href="#2-4-数据生成脚本" class="headerlink" title="2.4 数据生成脚本"></a>2.4 数据生成脚本</h3><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445276-36599c24-74d5-4b7f-8674-1979b36b32cf.png#align=left&display=inline&height=815&margin=%5Bobject%20Object%5D&originHeight=815&originWidth=1918&size=0&status=done&style=none&width=1918" alt></p><blockquote><p>如下案例中将省略图中红框中的日志生成过程，直接使用Java程序构建logFile文件。</p></blockquote><p><a name="4c767f75"></a></p><h4 id="2-4-1-数据生成格式"><a href="#2-4-1-数据生成格式" class="headerlink" title="2.4.1 数据生成格式"></a>2.4.1 数据生成格式</h4><ul><li>启动日志</li></ul><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445197-5930bda1-7de7-4d77-8450-d1408dde46ad.png#align=left&display=inline&height=738&margin=%5Bobject%20Object%5D&originHeight=738&originWidth=1301&size=0&status=done&style=none&width=1301" alt></p><blockquote><p>{“action”:”1”,”ar”:”MX”,”ba”:”Sumsung”,”detail”:”201”,”en”:”start”,”entry”:”4”,”extend1”:””,”g”:”<a href="mailto:69021X1Q@gmail.com" rel="external nofollow noopener noreferrer" target="_blank">69021X1Q@gmail.com</a>“,”hw”:”1080*1920”,”l”:”pt”,”la”:”-11.0”,”ln”:”-70.0”,”loading_time”:”9”,”md”:”sumsung-5”,”mid”:”244”,”nw”:”3G”,”open_ad_type”:”1”,”os”:”8.2.3”,”sr”:”D”,”sv”:”V2.1.3”,”t”:”1589612165914”,”uid”:”244”,”vc”:”16”,”vn”:”1.2.1”}</p></blockquote><ul><li>事件日志(由于转换问题，图中没有 “时间戳|”)</li></ul><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445200-dd221676-3d59-4d7f-a516-52882e9754d2.png#align=left&display=inline&height=810&margin=%5Bobject%20Object%5D&originHeight=810&originWidth=1300&size=0&status=done&style=none&width=1300" alt></p><blockquote><p>1589695383284|{“cm”:{“ln”:”-79.4”,”sv”:”V2.5.3”,”os”:”8.0.6”,”g”:”<a href="mailto:81614U54@gmail.com" rel="external nofollow noopener noreferrer" target="_blank">81614U54@gmail.com</a>“,”mid”:”245”,”nw”:”WIFI”,”l”:”pt”,”vc”:”6”,”hw”:”1080*1920”,”ar”:”MX”,”uid”:”245”,”t”:”1589627025851”,”la”:”-39.6”,”md”:”HTC-7”,”vn”:”1.3.5”,”ba”:”HTC”,”sr”:”N”},”ap”:”app”,”et”:[{“ett”:”1589650631883”,”en”:”display”,”kv”:{“goodsid”:”53”,”action”:”2”,”extend1”:”2”,”place”:”3”,”category”:”50”}},{“ett”:”1589690866312”,”en”:”newsdetail”,”kv”:{“entry”:”3”,”goodsid”:”54”,”news_staytime”:”1”,”loading_time”:”6”,”action”:”4”,”showtype”:”0”,”category”:”78”,”type1”:””}},{“ett”:”1589641734037”,”en”:”loading”,”kv”:{“extend2”:””,”loading_time”:”0”,”action”:”1”,”extend1”:””,”type”:”2”,”type1”:”201”,”loading_way”:”2”}},{“ett”:”1589687684878”,”en”:”ad”,”kv”:{“activityId”:”1”,”displayMills”:”92030”,”entry”:”3”,”action”:”5”,”contentType”:”0”}},{“ett”:”1589632980772”,”en”:”active_background”,”kv”:{“active_source”:”1”}},{“ett”:”1589682030324”,”en”:”error”,”kv”:{“errorDetail”:”java.lang.NullPointerException\n at cn.lift.appIn.web.AbstractBaseController.validInbound(AbstractBaseController.java:72)\n at cn.lift.dfdf.web.AbstractBaseController.validInbound”,”errorBrief”:”at cn.lift.dfdf.web.AbstractBaseController.validInbound(AbstractBaseController.java:72)”}},{“ett”:”1589675065650”,”en”:”comment”,”kv”:{“p_comment_id”:2,”addtime”:”1589624299628”,”praise_count”:509,”other_id”:6,”comment_id”:7,”reply_count”:35,”userid”:3,”content”:”关色芦候佰间纶珊斑禁尹赞涤仇彭企呵姜毅”}},{“ett”:”1589631359459”,”en”:”favorites”,”kv”:{“course_id”:7,”id”:0,”add_time”:”1589681240066”,”userid”:7}},{“ett”:”1589616574187”,”en”:”praise”,”kv”:{“target_id”:1,”id”:7,”type”:3,”add_time”:”1589642497314”,”userid”:8}}]}</p></blockquote><p><a name="e84eff3b"></a></p><h4 id="2-4-2-创建maven工程"><a href="#2-4-2-创建maven工程" class="headerlink" title="2.4.2 创建maven工程"></a>2.4.2 创建maven工程</h4><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445354-3c28f41c-e63c-4fb9-84d2-2758e24117d7.png#align=left&display=inline&height=356&margin=%5Bobject%20Object%5D&originHeight=356&originWidth=1501&size=0&status=done&style=none&width=1501" alt><br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445239-29830e5f-2628-4b61-b25c-860285f21cc5.png#align=left&display=inline&height=733&margin=%5Bobject%20Object%5D&originHeight=733&originWidth=1500&size=0&status=done&style=none&width=1500" alt><br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445257-f8650a51-09ee-42f3-b9e6-c374f207c5b9.png#align=left&display=inline&height=402&margin=%5Bobject%20Object%5D&originHeight=402&originWidth=1501&size=0&status=done&style=none&width=1501" alt><br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445238-b3a082a7-5282-4ffe-b763-62f4d159a8b3.png#align=left&display=inline&height=395&margin=%5Bobject%20Object%5D&originHeight=395&originWidth=1501&size=0&status=done&style=none&width=1501" alt></p><ul><li><p>data-producer：pom.xml</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br></pre></td><td class="code"><pre><span class="line">&lt;!--版本号统一--&gt;</span><br><span class="line">    &lt;properties&gt;</span><br><span class="line">        &lt;slf4j.version&gt;1.7.20&lt;&#x2F;slf4j.version&gt;</span><br><span class="line">        &lt;logback.version&gt;1.0.7&lt;&#x2F;logback.version&gt;</span><br><span class="line">    &lt;&#x2F;properties&gt;</span><br><span class="line">    &lt;dependencies&gt; &lt;!--阿里巴巴开源 json 解析框架--&gt;</span><br><span class="line">        &lt;dependency&gt;</span><br><span class="line">            &lt;groupId&gt;com.alibaba&lt;&#x2F;groupId&gt;</span><br><span class="line">            &lt;artifactId&gt;fastjson&lt;&#x2F;artifactId&gt;</span><br><span class="line">            &lt;version&gt;1.2.51&lt;&#x2F;version&gt;</span><br><span class="line">        &lt;&#x2F;dependency&gt; &lt;!--日志生成框架--&gt;</span><br><span class="line">        &lt;dependency&gt;</span><br><span class="line">            &lt;groupId&gt;ch.qos.logback&lt;&#x2F;groupId&gt;</span><br><span class="line">            &lt;artifactId&gt;logback-core&lt;&#x2F;artifactId&gt;</span><br><span class="line">            &lt;version&gt;$&#123;logback.version&#125;&lt;&#x2F;version&gt;</span><br><span class="line">        &lt;&#x2F;dependency&gt;</span><br><span class="line">        &lt;dependency&gt;</span><br><span class="line">            &lt;groupId&gt;ch.qos.logback&lt;&#x2F;groupId&gt;</span><br><span class="line">            &lt;artifactId&gt;logback-classic&lt;&#x2F;artifactId&gt;</span><br><span class="line">            &lt;version&gt;$&#123;logback.version&#125;&lt;&#x2F;version&gt;</span><br><span class="line">        &lt;&#x2F;dependency&gt;</span><br><span class="line">        &lt;dependency&gt;</span><br><span class="line">            &lt;groupId&gt;org.projectlombok&lt;&#x2F;groupId&gt;</span><br><span class="line">            &lt;artifactId&gt;lombok&lt;&#x2F;artifactId&gt;</span><br><span class="line">            &lt;version&gt;1.18.10&lt;&#x2F;version&gt;</span><br><span class="line">            &lt;scope&gt;provided&lt;&#x2F;scope&gt;</span><br><span class="line">        &lt;&#x2F;dependency&gt;</span><br><span class="line">    &lt;&#x2F;dependencies&gt;</span><br><span class="line">    &lt;build&gt;</span><br><span class="line">        &lt;plugins&gt;</span><br><span class="line">            &lt;plugin&gt;</span><br><span class="line">                &lt;artifactId&gt;maven-compiler-plugin&lt;&#x2F;artifactId&gt;</span><br><span class="line">                &lt;version&gt;2.3.2&lt;&#x2F;version&gt;</span><br><span class="line">                &lt;configuration&gt;</span><br><span class="line">                    &lt;source&gt;1.8&lt;&#x2F;source&gt;</span><br><span class="line">                    &lt;target&gt;1.8&lt;&#x2F;target&gt;</span><br><span class="line">                &lt;&#x2F;configuration&gt;</span><br><span class="line">            &lt;&#x2F;plugin&gt;</span><br><span class="line">            &lt;plugin&gt;</span><br><span class="line">                &lt;artifactId&gt;maven-assembly-plugin&lt;&#x2F;artifactId&gt;</span><br><span class="line">                &lt;configuration&gt;</span><br><span class="line">                    &lt;descriptorRefs&gt;</span><br><span class="line">                        &lt;descriptorRef&gt;jar-with-dependencies&lt;&#x2F;descriptorRef&gt;</span><br><span class="line">                    &lt;&#x2F;descriptorRefs&gt;</span><br><span class="line">                    &lt;archive&gt;</span><br><span class="line">                        &lt;manifest&gt;</span><br><span class="line">                            &lt;!--主类名--&gt;</span><br><span class="line">                            &lt;mainClass&gt;com.heaton.bigdata.datawarehouse.app.App&lt;&#x2F;mainClass&gt;</span><br><span class="line">                        &lt;&#x2F;manifest&gt;</span><br><span class="line">                    &lt;&#x2F;archive&gt;</span><br><span class="line">                &lt;&#x2F;configuration&gt;</span><br><span class="line">                &lt;executions&gt;</span><br><span class="line">                    &lt;execution&gt;</span><br><span class="line">                        &lt;id&gt;make-assembly&lt;&#x2F;id&gt;</span><br><span class="line">                        &lt;phase&gt;package&lt;&#x2F;phase&gt;</span><br><span class="line">                        &lt;goals&gt;</span><br><span class="line">                            &lt;goal&gt;single&lt;&#x2F;goal&gt;</span><br><span class="line">                        &lt;&#x2F;goals&gt;</span><br><span class="line">                    &lt;&#x2F;execution&gt;</span><br><span class="line">                &lt;&#x2F;executions&gt;</span><br><span class="line">            &lt;&#x2F;plugin&gt;</span><br><span class="line">        &lt;&#x2F;plugins&gt;</span><br><span class="line">    &lt;&#x2F;build&gt;</span><br></pre></td></tr></table></figure></li><li><p>data-producer：logback.xml</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br></pre></td><td class="code"><pre><span class="line">&lt;?xml version&#x3D;&quot;1.0&quot; encoding&#x3D;&quot;UTF-8&quot;?&gt;</span><br><span class="line">&lt;configuration debug&#x3D;&quot;false&quot;&gt; &lt;!--定义日志文件的存储地址 勿在 LogBack 的配置中使用相对路径 --&gt;</span><br><span class="line">    &lt;property name&#x3D;&quot;LOG_HOME&quot; value&#x3D;&quot;&#x2F;root&#x2F;logs&#x2F;&quot;&#x2F;&gt; &lt;!-- 控制台输出 --&gt;</span><br><span class="line">    &lt;appender name&#x3D;&quot;STDOUT&quot; class&#x3D;&quot;ch.qos.logback.core.ConsoleAppender&quot;&gt;</span><br><span class="line">        &lt;encoder</span><br><span class="line">                class&#x3D;&quot;ch.qos.logback.classic.encoder.PatternLayoutEncoder&quot;&gt; &lt;!--格式化输出：%d 表示日期，%thread 表示线程名，%-5level：级别从左显示 5 个字符宽度%msg： 日志消息，%n 是换行符 --&gt;</span><br><span class="line">            &lt;pattern&gt;%d&#123;yyyy-MM-dd HH:mm:ss.SSS&#125; [%thread] %-5level %logger&#123;50&#125; - %msg%n&lt;&#x2F;pattern&gt;</span><br><span class="line">        &lt;&#x2F;encoder&gt;</span><br><span class="line">    &lt;&#x2F;appender&gt; &lt;!-- 按照每天生成日志文件。存储事件日志 --&gt;</span><br><span class="line">    &lt;appender name&#x3D;&quot;FILE&quot;</span><br><span class="line">              class&#x3D;&quot;ch.qos.logback.core.rolling.RollingFileAppender&quot;&gt; &lt;!-- &lt;File&gt;$&#123;LOG_HOME&#125;&#x2F;app.log&lt;&#x2F;File&gt;设置日志不超过$&#123;log.max.size&#125;时的保存路径，注意， 如果是 web 项目会保存到 Tomcat 的 bin 目录 下 --&gt;</span><br><span class="line">        &lt;rollingPolicy class&#x3D;&quot;ch.qos.logback.core.rolling.TimeBasedRollingPolicy&quot;&gt; &lt;!--日志文件输出的文件名 --&gt;</span><br><span class="line">            &lt;FileNamePattern&gt;$&#123;LOG_HOME&#125;&#x2F;app-%d&#123;yyyy-MM-dd&#125;.log&lt;&#x2F;FileNamePattern&gt; &lt;!--日志文件保留天数 --&gt;</span><br><span class="line">            &lt;MaxHistory&gt;30&lt;&#x2F;MaxHistory&gt;</span><br><span class="line">        &lt;&#x2F;rollingPolicy&gt;</span><br><span class="line">        &lt;encoder class&#x3D;&quot;ch.qos.logback.classic.encoder.PatternLayoutEncoder&quot;&gt;</span><br><span class="line">            &lt;pattern&gt;%msg%n&lt;&#x2F;pattern&gt;</span><br><span class="line">        &lt;&#x2F;encoder&gt; &lt;!--日志文件最大的大小 --&gt;</span><br><span class="line">        &lt;triggeringPolicy class&#x3D;&quot;ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy&quot;&gt;</span><br><span class="line">            &lt;MaxFileSize&gt;10MB&lt;&#x2F;MaxFileSize&gt;</span><br><span class="line">        &lt;&#x2F;triggeringPolicy&gt;</span><br><span class="line">    &lt;&#x2F;appender&gt; &lt;!--异步打印日志--&gt;</span><br><span class="line">    &lt;appender name&#x3D;&quot;ASYNC_FILE&quot;</span><br><span class="line">              class&#x3D;&quot;ch.qos.logback.classic.AsyncAppender&quot;&gt; &lt;!-- 不丢失日志.默认的,如果队列的 80%已满,则会丢弃 TRACT、DEBUG、INFO 级别的日志 --&gt;</span><br><span class="line">        &lt;discardingThreshold&gt;0&lt;&#x2F;discardingThreshold&gt; &lt;!-- 更改默认的队列的深度,该值会影响性能.默认值为 256 --&gt;</span><br><span class="line">        &lt;queueSize&gt;512&lt;&#x2F;queueSize&gt; &lt;!-- 添加附加的 appender,最多只能添加一个 --&gt;</span><br><span class="line">        &lt;appender-ref ref&#x3D;&quot;FILE&quot;&#x2F;&gt;</span><br><span class="line">    &lt;&#x2F;appender&gt; &lt;!-- 日志输出级别 --&gt;</span><br><span class="line">    &lt;root level&#x3D;&quot;INFO&quot;&gt;</span><br><span class="line">        &lt;appender-ref ref&#x3D;&quot;STDOUT&quot;&#x2F;&gt;</span><br><span class="line">        &lt;appender-ref ref&#x3D;&quot;ASYNC_FILE&quot;&#x2F;&gt;</span><br><span class="line">        &lt;appender-ref ref&#x3D;&quot;error&quot;&#x2F;&gt;</span><br><span class="line">    &lt;&#x2F;root&gt;</span><br><span class="line">&lt;&#x2F;configuration&gt;</span><br></pre></td></tr></table></figure></li><li><p>data-flume：pom.xml</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line">&lt;dependencies&gt;</span><br><span class="line">        &lt;dependency&gt;</span><br><span class="line">            &lt;groupId&gt;org.apache.flume&lt;&#x2F;groupId&gt;</span><br><span class="line">            &lt;artifactId&gt;flume-ng-core&lt;&#x2F;artifactId&gt;</span><br><span class="line">            &lt;version&gt;1.9.0&lt;&#x2F;version&gt;</span><br><span class="line">        &lt;&#x2F;dependency&gt;</span><br><span class="line">    &lt;&#x2F;dependencies&gt;</span><br><span class="line">    &lt;build&gt;</span><br><span class="line">        &lt;plugins&gt;</span><br><span class="line">            &lt;plugin&gt;</span><br><span class="line">                &lt;artifactId&gt;maven-compiler-plugin&lt;&#x2F;artifactId&gt;</span><br><span class="line">                &lt;version&gt;2.3.2&lt;&#x2F;version&gt;</span><br><span class="line">                &lt;configuration&gt;</span><br><span class="line">                    &lt;source&gt;1.8&lt;&#x2F;source&gt;</span><br><span class="line">                    &lt;target&gt;1.8&lt;&#x2F;target&gt;</span><br><span class="line">                &lt;&#x2F;configuration&gt;</span><br><span class="line">            &lt;&#x2F;plugin&gt;</span><br><span class="line">        &lt;&#x2F;plugins&gt;</span><br><span class="line">    &lt;&#x2F;build&gt;</span><br></pre></td></tr></table></figure></li><li><p>hive-function：pom.xml</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line">&lt;dependencies&gt;</span><br><span class="line">        &lt;dependency&gt;</span><br><span class="line">            &lt;groupId&gt;org.apache.hive&lt;&#x2F;groupId&gt;</span><br><span class="line">            &lt;artifactId&gt;hive-exec&lt;&#x2F;artifactId&gt;</span><br><span class="line">            &lt;version&gt;2.1.1&lt;&#x2F;version&gt;</span><br><span class="line">        &lt;&#x2F;dependency&gt;</span><br><span class="line">    &lt;&#x2F;dependencies&gt;</span><br><span class="line">    &lt;build&gt;</span><br><span class="line">        &lt;plugins&gt;</span><br><span class="line">            &lt;plugin&gt;</span><br><span class="line">                &lt;artifactId&gt;maven-compiler-plugin&lt;&#x2F;artifactId&gt;</span><br><span class="line">                &lt;version&gt;2.3.2&lt;&#x2F;version&gt;</span><br><span class="line">                &lt;configuration&gt;</span><br><span class="line">                    &lt;source&gt;1.8&lt;&#x2F;source&gt;</span><br><span class="line">                    &lt;target&gt;1.8&lt;&#x2F;target&gt;</span><br><span class="line">                &lt;&#x2F;configuration&gt;</span><br><span class="line">            &lt;&#x2F;plugin&gt;</span><br><span class="line">        &lt;&#x2F;plugins&gt;</span><br><span class="line">    &lt;&#x2F;build&gt;</span><br></pre></td></tr></table></figure><p><a name="9ad75b74"></a></p><h4 id="2-4-3-各事件bean"><a href="#2-4-3-各事件bean" class="headerlink" title="2.4.3 各事件bean"></a>2.4.3 各事件bean</h4><blockquote><p>data-producer工程</p></blockquote></li></ul><p><a name="fd2c10ec"></a></p><h5 id="2-4-3-1-公共日志类"><a href="#2-4-3-1-公共日志类" class="headerlink" title="2.4.3.1 公共日志类"></a>2.4.3.1 公共日志类</h5><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line">import lombok.Data;</span><br><span class="line">&#x2F;** </span><br><span class="line">* @author Heaton</span><br><span class="line">* @email 70416450@qq.com</span><br><span class="line">* @date 2020&#x2F;4&#x2F;25 14:54 </span><br><span class="line">* @describe 公共日志类</span><br><span class="line">*&#x2F;</span><br><span class="line">@Data</span><br><span class="line">public class AppBase &#123;</span><br><span class="line">    private String mid; &#x2F;&#x2F; (String) 设备唯一</span><br><span class="line">    private String uid; &#x2F;&#x2F; (String) 用户 uid</span><br><span class="line">    private String vc; &#x2F;&#x2F; (String) versionCode，程序版本号</span><br><span class="line">    private String vn; &#x2F;&#x2F; (String) versionName，程序版本名</span><br><span class="line">    private String l; &#x2F;&#x2F; (String) 系统语言</span><br><span class="line">    private String sr; &#x2F;&#x2F; (String) 渠道号，应用从哪个渠道来的。</span><br><span class="line">    private String os; &#x2F;&#x2F; (String) Android 系统版本</span><br><span class="line">    private String ar; &#x2F;&#x2F; (String) 区域</span><br><span class="line">    private String md; &#x2F;&#x2F; (String) 手机型号</span><br><span class="line">    private String ba; &#x2F;&#x2F; (String) 手机品牌</span><br><span class="line">    private String sv; &#x2F;&#x2F; (String) sdkVersion</span><br><span class="line">    private String g; &#x2F;&#x2F; (String) gmail</span><br><span class="line">    private String hw; &#x2F;&#x2F; (String) heightXwidth，屏幕宽高</span><br><span class="line">    private String t; &#x2F;&#x2F; (String) 客户端日志产生时的时间</span><br><span class="line">    private String nw; &#x2F;&#x2F; (String) 网络模式</span><br><span class="line">    private String ln; &#x2F;&#x2F; (double) lng 经度</span><br><span class="line">    private String la; &#x2F;&#x2F; (double) lat 纬度</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><a name="b18452f7"></a></p><h5 id="2-4-3-2-启动日志类"><a href="#2-4-3-2-启动日志类" class="headerlink" title="2.4.3.2 启动日志类"></a>2.4.3.2 启动日志类</h5><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line">import lombok.Data;</span><br><span class="line">&#x2F;**</span><br><span class="line"> * @author Heaton</span><br><span class="line"> * @email 70416450@qq.com</span><br><span class="line"> * @date 2020&#x2F;4&#x2F;25 14:54</span><br><span class="line"> * @describe 启动日志类</span><br><span class="line"> *&#x2F;</span><br><span class="line">@Data</span><br><span class="line">public class AppStart extends AppBase &#123;</span><br><span class="line">    private String entry;&#x2F;&#x2F;入口： push&#x3D;1，widget&#x3D;2，icon&#x3D;3，notification&#x3D;4, lockscreen_widget</span><br><span class="line">    private String open_ad_type;&#x2F;&#x2F;开屏广告类型: 开屏原生广告&#x3D;1, 开屏插屏广告&#x3D;2</span><br><span class="line">    private String action;&#x2F;&#x2F;状态：成功&#x3D;1 失败&#x3D;2</span><br><span class="line">    private String loading_time;&#x2F;&#x2F;加载时长：计算下拉开始到接口返回数据的时间，（开始加载报 0，加载成功或加载失败才上报时间）</span><br><span class="line">    private String detail;&#x2F;&#x2F;失败码（没有则上报空）</span><br><span class="line">    private String extend1;&#x2F;&#x2F;失败的 message（没有则上报空）</span><br><span class="line">    private String en;&#x2F;&#x2F;启动日志类型标记</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><a name="9355740a"></a></p><h5 id="2-4-3-3-错误日志类"><a href="#2-4-3-3-错误日志类" class="headerlink" title="2.4.3.3 错误日志类"></a>2.4.3.3 错误日志类</h5><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">import lombok.Data;</span><br><span class="line">&#x2F;**</span><br><span class="line"> * @author Heaton</span><br><span class="line"> * @email 70416450@qq.com</span><br><span class="line"> * @date 2020&#x2F;4&#x2F;25 14:54</span><br><span class="line"> * @describe 错误日志类</span><br><span class="line"> *&#x2F;</span><br><span class="line">@Data</span><br><span class="line">public class AppErrorLog &#123;</span><br><span class="line">    private String errorBrief; &#x2F;&#x2F;错误摘要</span><br><span class="line">    private String errorDetail; &#x2F;&#x2F;错误详情</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><a name="14491595"></a></p><h5 id="2-4-3-4-商品点击日志类"><a href="#2-4-3-4-商品点击日志类" class="headerlink" title="2.4.3.4 商品点击日志类"></a>2.4.3.4 商品点击日志类</h5><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line">import lombok.Data;</span><br><span class="line">&#x2F;**</span><br><span class="line"> * @author Heaton</span><br><span class="line"> * @email 70416450@qq.com</span><br><span class="line"> * @date 2020&#x2F;4&#x2F;25 14:54</span><br><span class="line"> * @describe 商品点击日志类</span><br><span class="line"> *&#x2F;</span><br><span class="line">@Data</span><br><span class="line">public class AppDisplay &#123;</span><br><span class="line">    private String action;&#x2F;&#x2F;动作：曝光商品&#x3D;1，点击商品&#x3D;2</span><br><span class="line">    private String goodsid;&#x2F;&#x2F;商品 ID（服务端下发的 ID）</span><br><span class="line">    private String place;&#x2F;&#x2F;顺序（第几条商品，第一条为 0，第二条为 1，如此类推）</span><br><span class="line">    private String extend1;&#x2F;&#x2F;曝光类型：1 - 首次曝光 2-重复曝光（没有使用）</span><br><span class="line">    private String category;&#x2F;&#x2F;分类 ID（服务端定义的分类 ID）</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><a name="938fbf96"></a></p><h5 id="2-4-3-5-商品详情类"><a href="#2-4-3-5-商品详情类" class="headerlink" title="2.4.3.5 商品详情类"></a>2.4.3.5 商品详情类</h5><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line">import lombok.Data;</span><br><span class="line">&#x2F;**</span><br><span class="line"> * @author Heaton</span><br><span class="line"> * @email 70416450@qq.com</span><br><span class="line"> * @date 2020&#x2F;4&#x2F;25 14:54</span><br><span class="line"> * @describe 商品详情类</span><br><span class="line"> *&#x2F;</span><br><span class="line">@Data</span><br><span class="line">public class AppNewsDetail &#123;</span><br><span class="line">    private String entry;&#x2F;&#x2F;页面入口来源：应用首页&#x3D;1、push&#x3D;2、详情页相关推荐</span><br><span class="line">    private String action;&#x2F;&#x2F;动作：开始加载&#x3D;1，加载成功&#x3D;2（pv），加载失败&#x3D;3, 退出页面&#x3D;4</span><br><span class="line">    private String goodsid;&#x2F;&#x2F;商品 ID（服务端下发的 ID）</span><br><span class="line">    private String showtype;&#x2F;&#x2F;商品样式：0、无图 1、一张大图 2、两张图 3、三张小图 4、一张小 图 5、一张大图两张小图 来源于详情页相关推荐的商品，上报样式都为 0（因为都是左文右图）</span><br><span class="line">    private String news_staytime;&#x2F;&#x2F;页面停留时长：从商品开始加载时开始计算，到用户关闭页面 所用的时间。若中途用跳转到其它页面了，则暂停计时，待回到详情页时恢复计时。或中途划出的时间超 过 10 分钟，则本次计时作废，不上报本次数据。如未加载成功退出，则报空。</span><br><span class="line">    private String loading_time;&#x2F;&#x2F;加载时长：计算页面开始加载到接口返回数据的时间 （开始加 载报 0，加载成功或加载失败才上报时间）</span><br><span class="line">    private String type1;&#x2F;&#x2F;加载失败码：把加载失败状态码报回来（报空为加载成功，没有失败）</span><br><span class="line">    private String category;&#x2F;&#x2F;分类 ID（服务端定义的分类 ID）</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><a name="3841af8f"></a></p><h5 id="2-4-3-6-商品列表类"><a href="#2-4-3-6-商品列表类" class="headerlink" title="2.4.3.6 商品列表类"></a>2.4.3.6 商品列表类</h5><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line">import lombok.Data;</span><br><span class="line">&#x2F;**</span><br><span class="line"> * @author Heaton</span><br><span class="line"> * @email 70416450@qq.com</span><br><span class="line"> * @date 2020&#x2F;4&#x2F;25 14:54</span><br><span class="line"> * @describe 商品列表类</span><br><span class="line"> *&#x2F;</span><br><span class="line">@Data</span><br><span class="line">public class AppLoading &#123;</span><br><span class="line">    private String action;&#x2F;&#x2F;动作：开始加载&#x3D;1，加载成功&#x3D;2，加载失败</span><br><span class="line">    private String loading_time;&#x2F;&#x2F;加载时长：计算下拉开始到接口返回数据的时间，（开始加载报 0， 加载成功或加载失败才上报时间）</span><br><span class="line">    private String loading_way;&#x2F;&#x2F;加载类型：1-读取缓存，2-从接口拉新数据 （加载成功才上报加 载类型）</span><br><span class="line">    private String extend1;&#x2F;&#x2F;扩展字段 Extend1</span><br><span class="line">    private String extend2;&#x2F;&#x2F;扩展字段 Extend2</span><br><span class="line">    private String type;&#x2F;&#x2F;加载类型：自动加载&#x3D;1，用户下拽加载&#x3D;2，底部加载&#x3D;3（底部条触发点击底 部提示条&#x2F;点击返回顶部加载）</span><br><span class="line">    private String type1;&#x2F;&#x2F;加载失败码：把加载失败状态码报回来（报空为加载成功，没有失败）</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><a name="fe8dffa0"></a></p><h5 id="2-4-3-7-广告类"><a href="#2-4-3-7-广告类" class="headerlink" title="2.4.3.7 广告类"></a>2.4.3.7 广告类</h5><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line">import lombok.Data;</span><br><span class="line">&#x2F;**</span><br><span class="line"> * @author Heaton</span><br><span class="line"> * @email 70416450@qq.com</span><br><span class="line"> * @date 2020&#x2F;4&#x2F;25 14:54</span><br><span class="line"> * @describe 广告类</span><br><span class="line"> *&#x2F;</span><br><span class="line">@Data</span><br><span class="line">public class AppAd &#123;</span><br><span class="line">    private String entry;&#x2F;&#x2F;入口：商品列表页&#x3D;1  应用首页&#x3D;2 商品详情页&#x3D;3</span><br><span class="line">    private String action;&#x2F;&#x2F;动作： 广告展示&#x3D;1 广告点击&#x3D;2</span><br><span class="line">    private String contentType;&#x2F;&#x2F;Type: 1 商品 2 营销活动</span><br><span class="line">    private String displayMills;&#x2F;&#x2F;展示时长 毫秒数</span><br><span class="line">    private String itemId; &#x2F;&#x2F;商品id</span><br><span class="line">    private String activityId; &#x2F;&#x2F;营销活动id</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><a name="7e94e787"></a></p><h5 id="2-4-3-8-消息通知日志类"><a href="#2-4-3-8-消息通知日志类" class="headerlink" title="2.4.3.8 消息通知日志类"></a>2.4.3.8 消息通知日志类</h5><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line">import lombok.Data;</span><br><span class="line">&#x2F;**</span><br><span class="line"> * @author Heaton</span><br><span class="line"> * @email 70416450@qq.com</span><br><span class="line"> * @date 2020&#x2F;4&#x2F;25 14:54</span><br><span class="line"> * @describe 消息通知日志类</span><br><span class="line"> *&#x2F;</span><br><span class="line">@Data</span><br><span class="line">public class AppNotification &#123;</span><br><span class="line">    private String action;&#x2F;&#x2F;动作：通知产生&#x3D;1，通知弹出&#x3D;2，通知点击&#x3D;3，常驻通知展示（不重复上 报，一天之内只报一次）</span><br><span class="line">    private String type;&#x2F;&#x2F;通知 id：预警通知&#x3D;1，天气预报（早&#x3D;2，晚&#x3D;3），常驻&#x3D;4</span><br><span class="line">    private String ap_time;&#x2F;&#x2F;客户端弹出时间</span><br><span class="line">    private String content;&#x2F;&#x2F;备用字段</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><a name="f7b8845a"></a></p><h5 id="2-4-3-9-用户后台活跃类"><a href="#2-4-3-9-用户后台活跃类" class="headerlink" title="2.4.3.9 用户后台活跃类"></a>2.4.3.9 用户后台活跃类</h5><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">import lombok.Data;</span><br><span class="line">&#x2F;**</span><br><span class="line"> * @author Heaton</span><br><span class="line"> * @email 70416450@qq.com</span><br><span class="line"> * @date 2020&#x2F;4&#x2F;25 14:54</span><br><span class="line"> * @describe 用户后台活跃类</span><br><span class="line"> *&#x2F;</span><br><span class="line">@Data</span><br><span class="line">public class AppActive &#123;</span><br><span class="line">    private String active_source;&#x2F;&#x2F;1&#x3D;upgrade,2&#x3D;download(下载),3&#x3D;plugin_upgrade</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><a name="eed5386c"></a></p><h5 id="2-4-3-10-用户评论类"><a href="#2-4-3-10-用户评论类" class="headerlink" title="2.4.3.10 用户评论类"></a>2.4.3.10 用户评论类</h5><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line">import lombok.Data;</span><br><span class="line">&#x2F;**</span><br><span class="line"> * @author Heaton</span><br><span class="line"> * @email 70416450@qq.com</span><br><span class="line"> * @date 2020&#x2F;4&#x2F;25 14:54</span><br><span class="line"> * @describe 用户评论类</span><br><span class="line"> *&#x2F;</span><br><span class="line">@Data</span><br><span class="line">public class AppComment &#123;</span><br><span class="line">    private int comment_id;&#x2F;&#x2F;评论表</span><br><span class="line">    private int userid;&#x2F;&#x2F;用户 id</span><br><span class="line">    private int p_comment_id;&#x2F;&#x2F;父级评论 id(为 0 则是一级评论,不为 0 则是回复)</span><br><span class="line">    private String content;&#x2F;&#x2F;评论内容</span><br><span class="line">    private String addtime;&#x2F;&#x2F;创建时间</span><br><span class="line">    private int other_id;&#x2F;&#x2F;评论的相关 id</span><br><span class="line">    private int praise_count;&#x2F;&#x2F;点赞数量</span><br><span class="line">    private int reply_count;&#x2F;&#x2F;回复数量</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><a name="01adfa90"></a></p><h5 id="2-4-3-11-用户收藏类"><a href="#2-4-3-11-用户收藏类" class="headerlink" title="2.4.3.11 用户收藏类"></a>2.4.3.11 用户收藏类</h5><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line">import lombok.Data;</span><br><span class="line">&#x2F;**</span><br><span class="line"> * @author Heaton</span><br><span class="line"> * @email 70416450@qq.com</span><br><span class="line"> * @date 2020&#x2F;4&#x2F;25 14:54</span><br><span class="line"> * @describe 用户收藏类</span><br><span class="line"> *&#x2F;</span><br><span class="line">@Data</span><br><span class="line">public class AppFavorites &#123;</span><br><span class="line">    private int id;&#x2F;&#x2F;主键</span><br><span class="line">    private int course_id;&#x2F;&#x2F;商品 id</span><br><span class="line">    private int userid;&#x2F;&#x2F;用户 ID</span><br><span class="line">    private String add_time;&#x2F;&#x2F;创建时间</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><a name="381f0e2a"></a></p><h5 id="2-4-3-12-用户点赞类"><a href="#2-4-3-12-用户点赞类" class="headerlink" title="2.4.3.12 用户点赞类"></a>2.4.3.12 用户点赞类</h5><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line">import lombok.Data;</span><br><span class="line">&#x2F;**</span><br><span class="line"> * @author Heaton</span><br><span class="line"> * @email 70416450@qq.com</span><br><span class="line"> * @date 2020&#x2F;4&#x2F;25 14:54</span><br><span class="line"> * @describe 用户点赞类</span><br><span class="line"> *&#x2F;</span><br><span class="line">@Data</span><br><span class="line">public class AppPraise &#123;</span><br><span class="line">    private int id; &#x2F;&#x2F;主键 id</span><br><span class="line">    private int userid;&#x2F;&#x2F;用户 id</span><br><span class="line">    private int target_id;&#x2F;&#x2F;点赞的对象 id</span><br><span class="line">    private int type;&#x2F;&#x2F;点赞类型 1 问答点赞 2 问答评论点赞 3 文章点赞数 4 评论点赞</span><br><span class="line">    private String add_time;&#x2F;&#x2F;添加时间</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><a name="c9fc3b91"></a></p><h4 id="2-4-4-启动类"><a href="#2-4-4-启动类" class="headerlink" title="2.4.4 启动类"></a>2.4.4 启动类</h4><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br><span class="line">128</span><br><span class="line">129</span><br><span class="line">130</span><br><span class="line">131</span><br><span class="line">132</span><br><span class="line">133</span><br><span class="line">134</span><br><span class="line">135</span><br><span class="line">136</span><br><span class="line">137</span><br><span class="line">138</span><br><span class="line">139</span><br><span class="line">140</span><br><span class="line">141</span><br><span class="line">142</span><br><span class="line">143</span><br><span class="line">144</span><br><span class="line">145</span><br><span class="line">146</span><br><span class="line">147</span><br><span class="line">148</span><br><span class="line">149</span><br><span class="line">150</span><br><span class="line">151</span><br><span class="line">152</span><br><span class="line">153</span><br><span class="line">154</span><br><span class="line">155</span><br><span class="line">156</span><br><span class="line">157</span><br><span class="line">158</span><br><span class="line">159</span><br><span class="line">160</span><br><span class="line">161</span><br><span class="line">162</span><br><span class="line">163</span><br><span class="line">164</span><br><span class="line">165</span><br><span class="line">166</span><br><span class="line">167</span><br><span class="line">168</span><br><span class="line">169</span><br><span class="line">170</span><br><span class="line">171</span><br><span class="line">172</span><br><span class="line">173</span><br><span class="line">174</span><br><span class="line">175</span><br><span class="line">176</span><br><span class="line">177</span><br><span class="line">178</span><br><span class="line">179</span><br><span class="line">180</span><br><span class="line">181</span><br><span class="line">182</span><br><span class="line">183</span><br><span class="line">184</span><br><span class="line">185</span><br><span class="line">186</span><br><span class="line">187</span><br><span class="line">188</span><br><span class="line">189</span><br><span class="line">190</span><br><span class="line">191</span><br><span class="line">192</span><br><span class="line">193</span><br><span class="line">194</span><br><span class="line">195</span><br><span class="line">196</span><br><span class="line">197</span><br><span class="line">198</span><br><span class="line">199</span><br><span class="line">200</span><br><span class="line">201</span><br><span class="line">202</span><br><span class="line">203</span><br><span class="line">204</span><br><span class="line">205</span><br><span class="line">206</span><br><span class="line">207</span><br><span class="line">208</span><br><span class="line">209</span><br><span class="line">210</span><br><span class="line">211</span><br><span class="line">212</span><br><span class="line">213</span><br><span class="line">214</span><br><span class="line">215</span><br><span class="line">216</span><br><span class="line">217</span><br><span class="line">218</span><br><span class="line">219</span><br><span class="line">220</span><br><span class="line">221</span><br><span class="line">222</span><br><span class="line">223</span><br><span class="line">224</span><br><span class="line">225</span><br><span class="line">226</span><br><span class="line">227</span><br><span class="line">228</span><br><span class="line">229</span><br><span class="line">230</span><br><span class="line">231</span><br><span class="line">232</span><br><span class="line">233</span><br><span class="line">234</span><br><span class="line">235</span><br><span class="line">236</span><br><span class="line">237</span><br><span class="line">238</span><br><span class="line">239</span><br><span class="line">240</span><br><span class="line">241</span><br><span class="line">242</span><br><span class="line">243</span><br><span class="line">244</span><br><span class="line">245</span><br><span class="line">246</span><br><span class="line">247</span><br><span class="line">248</span><br><span class="line">249</span><br><span class="line">250</span><br><span class="line">251</span><br><span class="line">252</span><br><span class="line">253</span><br><span class="line">254</span><br><span class="line">255</span><br><span class="line">256</span><br><span class="line">257</span><br><span class="line">258</span><br><span class="line">259</span><br><span class="line">260</span><br><span class="line">261</span><br><span class="line">262</span><br><span class="line">263</span><br><span class="line">264</span><br><span class="line">265</span><br><span class="line">266</span><br><span class="line">267</span><br><span class="line">268</span><br><span class="line">269</span><br><span class="line">270</span><br><span class="line">271</span><br><span class="line">272</span><br><span class="line">273</span><br><span class="line">274</span><br><span class="line">275</span><br><span class="line">276</span><br><span class="line">277</span><br><span class="line">278</span><br><span class="line">279</span><br><span class="line">280</span><br><span class="line">281</span><br><span class="line">282</span><br><span class="line">283</span><br><span class="line">284</span><br><span class="line">285</span><br><span class="line">286</span><br><span class="line">287</span><br><span class="line">288</span><br><span class="line">289</span><br><span class="line">290</span><br><span class="line">291</span><br><span class="line">292</span><br><span class="line">293</span><br><span class="line">294</span><br><span class="line">295</span><br><span class="line">296</span><br><span class="line">297</span><br><span class="line">298</span><br><span class="line">299</span><br><span class="line">300</span><br><span class="line">301</span><br><span class="line">302</span><br><span class="line">303</span><br><span class="line">304</span><br><span class="line">305</span><br><span class="line">306</span><br><span class="line">307</span><br><span class="line">308</span><br><span class="line">309</span><br><span class="line">310</span><br><span class="line">311</span><br><span class="line">312</span><br><span class="line">313</span><br><span class="line">314</span><br><span class="line">315</span><br><span class="line">316</span><br><span class="line">317</span><br><span class="line">318</span><br><span class="line">319</span><br><span class="line">320</span><br><span class="line">321</span><br><span class="line">322</span><br><span class="line">323</span><br><span class="line">324</span><br><span class="line">325</span><br><span class="line">326</span><br><span class="line">327</span><br><span class="line">328</span><br><span class="line">329</span><br><span class="line">330</span><br><span class="line">331</span><br><span class="line">332</span><br><span class="line">333</span><br><span class="line">334</span><br><span class="line">335</span><br><span class="line">336</span><br><span class="line">337</span><br><span class="line">338</span><br><span class="line">339</span><br><span class="line">340</span><br><span class="line">341</span><br><span class="line">342</span><br><span class="line">343</span><br><span class="line">344</span><br><span class="line">345</span><br><span class="line">346</span><br><span class="line">347</span><br><span class="line">348</span><br><span class="line">349</span><br><span class="line">350</span><br><span class="line">351</span><br><span class="line">352</span><br><span class="line">353</span><br><span class="line">354</span><br><span class="line">355</span><br><span class="line">356</span><br><span class="line">357</span><br><span class="line">358</span><br><span class="line">359</span><br><span class="line">360</span><br><span class="line">361</span><br><span class="line">362</span><br><span class="line">363</span><br><span class="line">364</span><br><span class="line">365</span><br><span class="line">366</span><br><span class="line">367</span><br><span class="line">368</span><br><span class="line">369</span><br><span class="line">370</span><br><span class="line">371</span><br><span class="line">372</span><br><span class="line">373</span><br><span class="line">374</span><br><span class="line">375</span><br><span class="line">376</span><br><span class="line">377</span><br><span class="line">378</span><br><span class="line">379</span><br><span class="line">380</span><br><span class="line">381</span><br><span class="line">382</span><br><span class="line">383</span><br><span class="line">384</span><br><span class="line">385</span><br><span class="line">386</span><br><span class="line">387</span><br><span class="line">388</span><br><span class="line">389</span><br><span class="line">390</span><br><span class="line">391</span><br><span class="line">392</span><br><span class="line">393</span><br><span class="line">394</span><br><span class="line">395</span><br><span class="line">396</span><br><span class="line">397</span><br><span class="line">398</span><br><span class="line">399</span><br><span class="line">400</span><br><span class="line">401</span><br><span class="line">402</span><br><span class="line">403</span><br><span class="line">404</span><br><span class="line">405</span><br><span class="line">406</span><br><span class="line">407</span><br><span class="line">408</span><br><span class="line">409</span><br><span class="line">410</span><br><span class="line">411</span><br><span class="line">412</span><br><span class="line">413</span><br><span class="line">414</span><br><span class="line">415</span><br><span class="line">416</span><br><span class="line">417</span><br><span class="line">418</span><br><span class="line">419</span><br><span class="line">420</span><br><span class="line">421</span><br><span class="line">422</span><br><span class="line">423</span><br><span class="line">424</span><br><span class="line">425</span><br><span class="line">426</span><br><span class="line">427</span><br><span class="line">428</span><br><span class="line">429</span><br><span class="line">430</span><br><span class="line">431</span><br><span class="line">432</span><br><span class="line">433</span><br><span class="line">434</span><br><span class="line">435</span><br><span class="line">436</span><br><span class="line">437</span><br><span class="line">438</span><br><span class="line">439</span><br><span class="line">440</span><br><span class="line">441</span><br><span class="line">442</span><br><span class="line">443</span><br><span class="line">444</span><br><span class="line">445</span><br><span class="line">446</span><br><span class="line">447</span><br><span class="line">448</span><br><span class="line">449</span><br><span class="line">450</span><br><span class="line">451</span><br><span class="line">452</span><br><span class="line">453</span><br><span class="line">454</span><br><span class="line">455</span><br><span class="line">456</span><br><span class="line">457</span><br><span class="line">458</span><br><span class="line">459</span><br><span class="line">460</span><br><span class="line">461</span><br><span class="line">462</span><br><span class="line">463</span><br><span class="line">464</span><br><span class="line">465</span><br><span class="line">466</span><br><span class="line">467</span><br><span class="line">468</span><br><span class="line">469</span><br><span class="line">470</span><br><span class="line">471</span><br><span class="line">472</span><br><span class="line">473</span><br><span class="line">474</span><br><span class="line">475</span><br><span class="line">476</span><br><span class="line">477</span><br><span class="line">478</span><br><span class="line">479</span><br><span class="line">480</span><br><span class="line">481</span><br><span class="line">482</span><br><span class="line">483</span><br><span class="line">484</span><br><span class="line">485</span><br><span class="line">486</span><br><span class="line">487</span><br><span class="line">488</span><br><span class="line">489</span><br><span class="line">490</span><br><span class="line">491</span><br><span class="line">492</span><br><span class="line">493</span><br><span class="line">494</span><br><span class="line">495</span><br><span class="line">496</span><br><span class="line">497</span><br><span class="line">498</span><br><span class="line">499</span><br><span class="line">500</span><br><span class="line">501</span><br><span class="line">502</span><br><span class="line">503</span><br><span class="line">504</span><br><span class="line">505</span><br><span class="line">506</span><br><span class="line">507</span><br><span class="line">508</span><br><span class="line">509</span><br><span class="line">510</span><br><span class="line">511</span><br><span class="line">512</span><br><span class="line">513</span><br><span class="line">514</span><br><span class="line">515</span><br><span class="line">516</span><br><span class="line">517</span><br><span class="line">518</span><br><span class="line">519</span><br><span class="line">520</span><br><span class="line">521</span><br><span class="line">522</span><br><span class="line">523</span><br><span class="line">524</span><br><span class="line">525</span><br><span class="line">526</span><br><span class="line">527</span><br><span class="line">528</span><br><span class="line">529</span><br><span class="line">530</span><br><span class="line">531</span><br><span class="line">532</span><br><span class="line">533</span><br><span class="line">534</span><br><span class="line">535</span><br><span class="line">536</span><br><span class="line">537</span><br><span class="line">538</span><br><span class="line">539</span><br><span class="line">540</span><br><span class="line">541</span><br><span class="line">542</span><br><span class="line">543</span><br><span class="line">544</span><br><span class="line">545</span><br><span class="line">546</span><br><span class="line">547</span><br><span class="line">548</span><br><span class="line">549</span><br><span class="line">550</span><br><span class="line">551</span><br><span class="line">552</span><br><span class="line">553</span><br><span class="line">554</span><br><span class="line">555</span><br><span class="line">556</span><br><span class="line">557</span><br><span class="line">558</span><br><span class="line">559</span><br><span class="line">560</span><br><span class="line">561</span><br><span class="line">562</span><br><span class="line">563</span><br><span class="line">564</span><br><span class="line">565</span><br><span class="line">566</span><br><span class="line">567</span><br><span class="line">568</span><br><span class="line">569</span><br><span class="line">570</span><br><span class="line">571</span><br><span class="line">572</span><br><span class="line">573</span><br><span class="line">574</span><br><span class="line">575</span><br><span class="line">576</span><br><span class="line">577</span><br><span class="line">578</span><br><span class="line">579</span><br><span class="line">580</span><br><span class="line">581</span><br><span class="line">582</span><br><span class="line">583</span><br><span class="line">584</span><br><span class="line">585</span><br><span class="line">586</span><br><span class="line">587</span><br><span class="line">588</span><br><span class="line">589</span><br><span class="line">590</span><br><span class="line">591</span><br><span class="line">592</span><br><span class="line">593</span><br><span class="line">594</span><br><span class="line">595</span><br><span class="line">596</span><br><span class="line">597</span><br><span class="line">598</span><br><span class="line">599</span><br><span class="line">600</span><br><span class="line">601</span><br><span class="line">602</span><br><span class="line">603</span><br><span class="line">604</span><br><span class="line">605</span><br><span class="line">606</span><br><span class="line">607</span><br><span class="line">608</span><br><span class="line">609</span><br><span class="line">610</span><br><span class="line">611</span><br><span class="line">612</span><br><span class="line">613</span><br><span class="line">614</span><br><span class="line">615</span><br><span class="line">616</span><br><span class="line">617</span><br><span class="line">618</span><br><span class="line">619</span><br><span class="line">620</span><br><span class="line">621</span><br><span class="line">622</span><br><span class="line">623</span><br><span class="line">624</span><br><span class="line">625</span><br><span class="line">626</span><br><span class="line">627</span><br><span class="line">628</span><br><span class="line">629</span><br><span class="line">630</span><br><span class="line">631</span><br><span class="line">632</span><br><span class="line">633</span><br><span class="line">634</span><br><span class="line">635</span><br><span class="line">636</span><br><span class="line">637</span><br><span class="line">638</span><br><span class="line">639</span><br><span class="line">640</span><br><span class="line">641</span><br><span class="line">642</span><br><span class="line">643</span><br><span class="line">644</span><br><span class="line">645</span><br><span class="line">646</span><br><span class="line">647</span><br><span class="line">648</span><br><span class="line">649</span><br><span class="line">650</span><br><span class="line">651</span><br><span class="line">652</span><br><span class="line">653</span><br><span class="line">654</span><br><span class="line">655</span><br><span class="line">656</span><br><span class="line">657</span><br><span class="line">658</span><br><span class="line">659</span><br><span class="line">660</span><br><span class="line">661</span><br></pre></td><td class="code"><pre><span class="line">import com.alibaba.fastjson.JSON;</span><br><span class="line">import com.alibaba.fastjson.JSONArray;</span><br><span class="line">import com.alibaba.fastjson.JSONObject;</span><br><span class="line">import org.slf4j.Logger;</span><br><span class="line">import org.slf4j.LoggerFactory;</span><br><span class="line">import java.io.UnsupportedEncodingException;</span><br><span class="line">import java.util.Random;</span><br><span class="line">&#x2F;**</span><br><span class="line"> * @author Heaton</span><br><span class="line"> * @email 70416450@qq.com</span><br><span class="line"> * @date 2020&#x2F;4&#x2F;25 14:54</span><br><span class="line"> * @describe 启动类</span><br><span class="line"> *&#x2F;</span><br><span class="line">public class App &#123;</span><br><span class="line">    private final static Logger logger &#x3D; LoggerFactory.getLogger(App.class);</span><br><span class="line">    private static Random rand &#x3D; new Random();</span><br><span class="line">    &#x2F;&#x2F; 设备id</span><br><span class="line">    private static int s_mid &#x3D; 0;</span><br><span class="line">    &#x2F;&#x2F; 用户id</span><br><span class="line">    private static int s_uid &#x3D; 0;</span><br><span class="line">    &#x2F;&#x2F; 商品id</span><br><span class="line">    private static int s_goodsid &#x3D; 0;</span><br><span class="line">    public static void main(String[] args) &#123;</span><br><span class="line">        &#x2F;&#x2F; 参数一：控制发送每条的延时时间，默认是0</span><br><span class="line">        Long delay &#x3D; args.length &gt; 0 ? Long.parseLong(args[0]) : 0L;</span><br><span class="line">        &#x2F;&#x2F; 参数二：循环遍历次数</span><br><span class="line">        int loop_len &#x3D; args.length &gt; 1 ? Integer.parseInt(args[1]) : 1000;</span><br><span class="line">        &#x2F;&#x2F; 生成数据</span><br><span class="line">        generateLog(delay, loop_len);</span><br><span class="line">    &#125;</span><br><span class="line">    private static void generateLog(Long delay, int loop_len) &#123;</span><br><span class="line">        for (int i &#x3D; 0; i &lt; loop_len; i++) &#123;</span><br><span class="line">            int flag &#x3D; rand.nextInt(2);</span><br><span class="line">            switch (flag) &#123;</span><br><span class="line">                case (0):</span><br><span class="line">                    &#x2F;&#x2F;应用启动</span><br><span class="line">                    AppStart appStart &#x3D; generateStart();</span><br><span class="line">                    String jsonString &#x3D; JSON.toJSONString(appStart);</span><br><span class="line">                    &#x2F;&#x2F;控制台打印</span><br><span class="line">                    logger.info(jsonString);</span><br><span class="line">                    break;</span><br><span class="line">                case (1):</span><br><span class="line">                    JSONObject json &#x3D; new JSONObject();</span><br><span class="line">                    json.put(&quot;ap&quot;, &quot;app&quot;);</span><br><span class="line">                    json.put(&quot;cm&quot;, generateComFields());</span><br><span class="line">                    JSONArray eventsArray &#x3D; new JSONArray();</span><br><span class="line">                    &#x2F;&#x2F; 事件日志</span><br><span class="line">                    &#x2F;&#x2F; 商品点击，展示</span><br><span class="line">                    if (rand.nextBoolean()) &#123;</span><br><span class="line">                        eventsArray.add(generateDisplay());</span><br><span class="line">                        json.put(&quot;et&quot;, eventsArray);</span><br><span class="line">                    &#125;</span><br><span class="line">                    &#x2F;&#x2F; 商品详情页</span><br><span class="line">                    if (rand.nextBoolean()) &#123;</span><br><span class="line">                        eventsArray.add(generateNewsDetail());</span><br><span class="line">                        json.put(&quot;et&quot;, eventsArray);</span><br><span class="line">                    &#125;</span><br><span class="line">                    &#x2F;&#x2F; 商品列表页</span><br><span class="line">                    if (rand.nextBoolean()) &#123;</span><br><span class="line">                        eventsArray.add(generateNewList());</span><br><span class="line">                        json.put(&quot;et&quot;, eventsArray);</span><br><span class="line">                    &#125;</span><br><span class="line">                    &#x2F;&#x2F; 广告</span><br><span class="line">                    if (rand.nextBoolean()) &#123;</span><br><span class="line">                        eventsArray.add(generateAd());</span><br><span class="line">                        json.put(&quot;et&quot;, eventsArray);</span><br><span class="line">                    &#125;</span><br><span class="line">                    &#x2F;&#x2F; 消息通知</span><br><span class="line">                    if (rand.nextBoolean()) &#123;</span><br><span class="line">                        eventsArray.add(generateNotification());</span><br><span class="line">                        json.put(&quot;et&quot;, eventsArray);</span><br><span class="line">                    &#125;</span><br><span class="line">                    &#x2F;&#x2F; 用户后台活跃</span><br><span class="line">                    if (rand.nextBoolean()) &#123;</span><br><span class="line">                        eventsArray.add(generateBackground());</span><br><span class="line">                        json.put(&quot;et&quot;, eventsArray);</span><br><span class="line">                    &#125;</span><br><span class="line">                    &#x2F;&#x2F;故障日志</span><br><span class="line">                    if (rand.nextBoolean()) &#123;</span><br><span class="line">                        eventsArray.add(generateError());</span><br><span class="line">                        json.put(&quot;et&quot;, eventsArray);</span><br><span class="line">                    &#125;</span><br><span class="line">                    &#x2F;&#x2F; 用户评论</span><br><span class="line">                    if (rand.nextBoolean()) &#123;</span><br><span class="line">                        eventsArray.add(generateComment());</span><br><span class="line">                        json.put(&quot;et&quot;, eventsArray);</span><br><span class="line">                    &#125;</span><br><span class="line">                    &#x2F;&#x2F; 用户收藏</span><br><span class="line">                    if (rand.nextBoolean()) &#123;</span><br><span class="line">                        eventsArray.add(generateFavorites());</span><br><span class="line">                        json.put(&quot;et&quot;, eventsArray);</span><br><span class="line">                    &#125;</span><br><span class="line">                    &#x2F;&#x2F; 用户点赞</span><br><span class="line">                    if (rand.nextBoolean()) &#123;</span><br><span class="line">                        eventsArray.add(generatePraise());</span><br><span class="line">                        json.put(&quot;et&quot;, eventsArray);</span><br><span class="line">                    &#125;</span><br><span class="line">                    &#x2F;&#x2F;时间</span><br><span class="line">                    long millis &#x3D; System.currentTimeMillis();</span><br><span class="line">                    &#x2F;&#x2F;控制台打印</span><br><span class="line">                    logger.info(millis + &quot;|&quot; + json.toJSONString());</span><br><span class="line">                    break;</span><br><span class="line">            &#125;</span><br><span class="line">            &#x2F;&#x2F; 延迟</span><br><span class="line">            try &#123;</span><br><span class="line">                Thread.sleep(delay);</span><br><span class="line">            &#125; catch (InterruptedException e) &#123;</span><br><span class="line">                e.printStackTrace();</span><br><span class="line">            &#125;</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    &#x2F;**</span><br><span class="line">     * 公共字段设置</span><br><span class="line">     *&#x2F;</span><br><span class="line">    private static JSONObject generateComFields() &#123;</span><br><span class="line">        AppBase appBase &#x3D; new AppBase();</span><br><span class="line">        &#x2F;&#x2F;设备id</span><br><span class="line">        appBase.setMid(s_mid + &quot;&quot;);</span><br><span class="line">        s_mid++;</span><br><span class="line">        &#x2F;&#x2F; 用户id</span><br><span class="line">        appBase.setUid(s_uid + &quot;&quot;);</span><br><span class="line">        s_uid++;</span><br><span class="line">        &#x2F;&#x2F; 程序版本号 5,6等</span><br><span class="line">        appBase.setVc(&quot;&quot; + rand.nextInt(20));</span><br><span class="line">        &#x2F;&#x2F;程序版本名 v1.1.1</span><br><span class="line">        appBase.setVn(&quot;1.&quot; + rand.nextInt(4) + &quot;.&quot; + rand.nextInt(10));</span><br><span class="line">        &#x2F;&#x2F; 安卓系统版本</span><br><span class="line">        appBase.setOs(&quot;8.&quot; + rand.nextInt(3) + &quot;.&quot; + rand.nextInt(10));</span><br><span class="line">        &#x2F;&#x2F; 语言  es,en,pt</span><br><span class="line">        int flag &#x3D; rand.nextInt(3);</span><br><span class="line">        switch (flag) &#123;</span><br><span class="line">            case (0):</span><br><span class="line">                appBase.setL(&quot;es&quot;);</span><br><span class="line">                break;</span><br><span class="line">            case (1):</span><br><span class="line">                appBase.setL(&quot;en&quot;);</span><br><span class="line">                break;</span><br><span class="line">            case (2):</span><br><span class="line">                appBase.setL(&quot;pt&quot;);</span><br><span class="line">                break;</span><br><span class="line">        &#125;</span><br><span class="line">        &#x2F;&#x2F; 渠道号   从哪个渠道来的</span><br><span class="line">        appBase.setSr(getRandomChar(1));</span><br><span class="line">        &#x2F;&#x2F; 区域</span><br><span class="line">        flag &#x3D; rand.nextInt(2);</span><br><span class="line">        switch (flag) &#123;</span><br><span class="line">            case 0:</span><br><span class="line">                appBase.setAr(&quot;BR&quot;);</span><br><span class="line">            case 1:</span><br><span class="line">                appBase.setAr(&quot;MX&quot;);</span><br><span class="line">        &#125;</span><br><span class="line">        &#x2F;&#x2F; 手机品牌 ba ,手机型号 md，就取2位数字了</span><br><span class="line">        flag &#x3D; rand.nextInt(3);</span><br><span class="line">        switch (flag) &#123;</span><br><span class="line">            case 0:</span><br><span class="line">                appBase.setBa(&quot;Sumsung&quot;);</span><br><span class="line">                appBase.setMd(&quot;sumsung-&quot; + rand.nextInt(20));</span><br><span class="line">                break;</span><br><span class="line">            case 1:</span><br><span class="line">                appBase.setBa(&quot;Huawei&quot;);</span><br><span class="line">                appBase.setMd(&quot;Huawei-&quot; + rand.nextInt(20));</span><br><span class="line">                break;</span><br><span class="line">            case 2:</span><br><span class="line">                appBase.setBa(&quot;HTC&quot;);</span><br><span class="line">                appBase.setMd(&quot;HTC-&quot; + rand.nextInt(20));</span><br><span class="line">                break;</span><br><span class="line">        &#125;</span><br><span class="line">        &#x2F;&#x2F; 嵌入sdk的版本</span><br><span class="line">        appBase.setSv(&quot;V2.&quot; + rand.nextInt(10) + &quot;.&quot; + rand.nextInt(10));</span><br><span class="line">        &#x2F;&#x2F; gmail</span><br><span class="line">        appBase.setG(getRandomCharAndNumr(8) + &quot;@gmail.com&quot;);</span><br><span class="line">        &#x2F;&#x2F; 屏幕宽高 hw</span><br><span class="line">        flag &#x3D; rand.nextInt(4);</span><br><span class="line">        switch (flag) &#123;</span><br><span class="line">            case 0:</span><br><span class="line">                appBase.setHw(&quot;640*960&quot;);</span><br><span class="line">                break;</span><br><span class="line">            case 1:</span><br><span class="line">                appBase.setHw(&quot;640*1136&quot;);</span><br><span class="line">                break;</span><br><span class="line">            case 2:</span><br><span class="line">                appBase.setHw(&quot;750*1134&quot;);</span><br><span class="line">                break;</span><br><span class="line">            case 3:</span><br><span class="line">                appBase.setHw(&quot;1080*1920&quot;);</span><br><span class="line">                break;</span><br><span class="line">        &#125;</span><br><span class="line">        &#x2F;&#x2F; 客户端产生日志时间</span><br><span class="line">        long millis &#x3D; System.currentTimeMillis();</span><br><span class="line">        appBase.setT(&quot;&quot; + (millis - rand.nextInt(99999999)));</span><br><span class="line">        &#x2F;&#x2F; 手机网络模式 3G,4G,WIFI</span><br><span class="line">        flag &#x3D; rand.nextInt(3);</span><br><span class="line">        switch (flag) &#123;</span><br><span class="line">            case 0:</span><br><span class="line">                appBase.setNw(&quot;3G&quot;);</span><br><span class="line">                break;</span><br><span class="line">            case 1:</span><br><span class="line">                appBase.setNw(&quot;4G&quot;);</span><br><span class="line">                break;</span><br><span class="line">            case 2:</span><br><span class="line">                appBase.setNw(&quot;WIFI&quot;);</span><br><span class="line">                break;</span><br><span class="line">        &#125;</span><br><span class="line">        &#x2F;&#x2F; 拉丁美洲 西经34°46′至西经117°09；北纬32°42′至南纬53°54′</span><br><span class="line">        &#x2F;&#x2F; 经度</span><br><span class="line">        appBase.setLn((-34 - rand.nextInt(83) - rand.nextInt(60) &#x2F; 10.0) + &quot;&quot;);</span><br><span class="line">        &#x2F;&#x2F; 纬度</span><br><span class="line">        appBase.setLa((32 - rand.nextInt(85) - rand.nextInt(60) &#x2F; 10.0) + &quot;&quot;);</span><br><span class="line">        return (JSONObject) JSON.toJSON(appBase);</span><br><span class="line">    &#125;</span><br><span class="line">    &#x2F;**</span><br><span class="line">     * 商品展示事件</span><br><span class="line">     *&#x2F;</span><br><span class="line">    private static JSONObject generateDisplay() &#123;</span><br><span class="line">        AppDisplay appDisplay &#x3D; new AppDisplay();</span><br><span class="line">        boolean boolFlag &#x3D; rand.nextInt(10) &lt; 7;</span><br><span class="line">        &#x2F;&#x2F; 动作：曝光商品&#x3D;1，点击商品&#x3D;2，</span><br><span class="line">        if (boolFlag) &#123;</span><br><span class="line">            appDisplay.setAction(&quot;1&quot;);</span><br><span class="line">        &#125; else &#123;</span><br><span class="line">            appDisplay.setAction(&quot;2&quot;);</span><br><span class="line">        &#125;</span><br><span class="line">        &#x2F;&#x2F; 商品id</span><br><span class="line">        String goodsId &#x3D; s_goodsid + &quot;&quot;;</span><br><span class="line">        s_goodsid++;</span><br><span class="line">        appDisplay.setGoodsid(goodsId);</span><br><span class="line">        &#x2F;&#x2F; 顺序  设置成6条吧</span><br><span class="line">        int flag &#x3D; rand.nextInt(6);</span><br><span class="line">        appDisplay.setPlace(&quot;&quot; + flag);</span><br><span class="line">        &#x2F;&#x2F; 曝光类型</span><br><span class="line">        flag &#x3D; 1 + rand.nextInt(2);</span><br><span class="line">        appDisplay.setExtend1(&quot;&quot; + flag);</span><br><span class="line">        &#x2F;&#x2F; 分类</span><br><span class="line">        flag &#x3D; 1 + rand.nextInt(100);</span><br><span class="line">        appDisplay.setCategory(&quot;&quot; + flag);</span><br><span class="line">        JSONObject jsonObject &#x3D; (JSONObject) JSON.toJSON(appDisplay);</span><br><span class="line">        return packEventJson(&quot;display&quot;, jsonObject);</span><br><span class="line">    &#125;</span><br><span class="line">    &#x2F;**</span><br><span class="line">     * 商品详情页</span><br><span class="line">     *&#x2F;</span><br><span class="line">    private static JSONObject generateNewsDetail() &#123;</span><br><span class="line">        AppNewsDetail appNewsDetail &#x3D; new AppNewsDetail();</span><br><span class="line">        &#x2F;&#x2F; 页面入口来源</span><br><span class="line">        int flag &#x3D; 1 + rand.nextInt(3);</span><br><span class="line">        appNewsDetail.setEntry(flag + &quot;&quot;);</span><br><span class="line">        &#x2F;&#x2F; 动作</span><br><span class="line">        appNewsDetail.setAction(&quot;&quot; + (rand.nextInt(4) + 1));</span><br><span class="line">        &#x2F;&#x2F; 商品id</span><br><span class="line">        appNewsDetail.setGoodsid(s_goodsid + &quot;&quot;);</span><br><span class="line">        &#x2F;&#x2F; 商品来源类型</span><br><span class="line">        flag &#x3D; 1 + rand.nextInt(3);</span><br><span class="line">        appNewsDetail.setShowtype(flag + &quot;&quot;);</span><br><span class="line">        &#x2F;&#x2F; 商品样式</span><br><span class="line">        flag &#x3D; rand.nextInt(6);</span><br><span class="line">        appNewsDetail.setShowtype(&quot;&quot; + flag);</span><br><span class="line">        &#x2F;&#x2F; 页面停留时长</span><br><span class="line">        flag &#x3D; rand.nextInt(10) * rand.nextInt(7);</span><br><span class="line">        appNewsDetail.setNews_staytime(flag + &quot;&quot;);</span><br><span class="line">        &#x2F;&#x2F; 加载时长</span><br><span class="line">        flag &#x3D; rand.nextInt(10) * rand.nextInt(7);</span><br><span class="line">        appNewsDetail.setLoading_time(flag + &quot;&quot;);</span><br><span class="line">        &#x2F;&#x2F; 加载失败码</span><br><span class="line">        flag &#x3D; rand.nextInt(10);</span><br><span class="line">        switch (flag) &#123;</span><br><span class="line">            case 1:</span><br><span class="line">                appNewsDetail.setType1(&quot;102&quot;);</span><br><span class="line">                break;</span><br><span class="line">            case 2:</span><br><span class="line">                appNewsDetail.setType1(&quot;201&quot;);</span><br><span class="line">                break;</span><br><span class="line">            case 3:</span><br><span class="line">                appNewsDetail.setType1(&quot;325&quot;);</span><br><span class="line">                break;</span><br><span class="line">            case 4:</span><br><span class="line">                appNewsDetail.setType1(&quot;433&quot;);</span><br><span class="line">                break;</span><br><span class="line">            case 5:</span><br><span class="line">                appNewsDetail.setType1(&quot;542&quot;);</span><br><span class="line">                break;</span><br><span class="line">            default:</span><br><span class="line">                appNewsDetail.setType1(&quot;&quot;);</span><br><span class="line">                break;</span><br><span class="line">        &#125;</span><br><span class="line">        &#x2F;&#x2F; 分类</span><br><span class="line">        flag &#x3D; 1 + rand.nextInt(100);</span><br><span class="line">        appNewsDetail.setCategory(&quot;&quot; + flag);</span><br><span class="line">        JSONObject eventJson &#x3D; (JSONObject) JSON.toJSON(appNewsDetail);</span><br><span class="line">        return packEventJson(&quot;newsdetail&quot;, eventJson);</span><br><span class="line">    &#125;</span><br><span class="line">    &#x2F;**</span><br><span class="line">     * 商品列表</span><br><span class="line">     *&#x2F;</span><br><span class="line">    private static JSONObject generateNewList() &#123;</span><br><span class="line">        AppLoading appLoading &#x3D; new AppLoading();</span><br><span class="line">        &#x2F;&#x2F; 动作</span><br><span class="line">        int flag &#x3D; rand.nextInt(3) + 1;</span><br><span class="line">        appLoading.setAction(flag + &quot;&quot;);</span><br><span class="line">        &#x2F;&#x2F; 加载时长</span><br><span class="line">        flag &#x3D; rand.nextInt(10) * rand.nextInt(7);</span><br><span class="line">        appLoading.setLoading_time(flag + &quot;&quot;);</span><br><span class="line">        &#x2F;&#x2F; 失败码</span><br><span class="line">        flag &#x3D; rand.nextInt(10);</span><br><span class="line">        switch (flag) &#123;</span><br><span class="line">            case 1:</span><br><span class="line">                appLoading.setType1(&quot;102&quot;);</span><br><span class="line">                break;</span><br><span class="line">            case 2:</span><br><span class="line">                appLoading.setType1(&quot;201&quot;);</span><br><span class="line">                break;</span><br><span class="line">            case 3:</span><br><span class="line">                appLoading.setType1(&quot;325&quot;);</span><br><span class="line">                break;</span><br><span class="line">            case 4:</span><br><span class="line">                appLoading.setType1(&quot;433&quot;);</span><br><span class="line">                break;</span><br><span class="line">            case 5:</span><br><span class="line">                appLoading.setType1(&quot;542&quot;);</span><br><span class="line">                break;</span><br><span class="line">            default:</span><br><span class="line">                appLoading.setType1(&quot;&quot;);</span><br><span class="line">                break;</span><br><span class="line">        &#125;</span><br><span class="line">        &#x2F;&#x2F; 页面  加载类型</span><br><span class="line">        flag &#x3D; 1 + rand.nextInt(2);</span><br><span class="line">        appLoading.setLoading_way(&quot;&quot; + flag);</span><br><span class="line">        &#x2F;&#x2F; 扩展字段1</span><br><span class="line">        appLoading.setExtend1(&quot;&quot;);</span><br><span class="line">        &#x2F;&#x2F; 扩展字段2</span><br><span class="line">        appLoading.setExtend2(&quot;&quot;);</span><br><span class="line">        &#x2F;&#x2F; 用户加载类型</span><br><span class="line">        flag &#x3D; 1 + rand.nextInt(3);</span><br><span class="line">        appLoading.setType(&quot;&quot; + flag);</span><br><span class="line">        JSONObject jsonObject &#x3D; (JSONObject) JSON.toJSON(appLoading);</span><br><span class="line">        return packEventJson(&quot;loading&quot;, jsonObject);</span><br><span class="line">    &#125;</span><br><span class="line">    &#x2F;**</span><br><span class="line">     * 广告相关字段</span><br><span class="line">     *&#x2F;</span><br><span class="line">    private static JSONObject generateAd() &#123;</span><br><span class="line">        AppAd appAd &#x3D; new AppAd();</span><br><span class="line">        &#x2F;&#x2F; 入口</span><br><span class="line">        int flag &#x3D; rand.nextInt(3) + 1;</span><br><span class="line">        appAd.setEntry(flag + &quot;&quot;);</span><br><span class="line">        &#x2F;&#x2F; 动作</span><br><span class="line">        flag &#x3D; rand.nextInt(5) + 1;</span><br><span class="line">        appAd.setAction(flag + &quot;&quot;);</span><br><span class="line">        &#x2F;&#x2F; 内容类型类型</span><br><span class="line">        flag &#x3D; rand.nextInt(6) + 1;</span><br><span class="line">        appAd.setContentType(flag + &quot;&quot;);</span><br><span class="line">        &#x2F;&#x2F; 展示样式</span><br><span class="line">        flag &#x3D; rand.nextInt(120000) + 1000;</span><br><span class="line">        appAd.setDisplayMills(flag + &quot;&quot;);</span><br><span class="line">        flag &#x3D; rand.nextInt(1);</span><br><span class="line">        if (flag &#x3D;&#x3D; 1) &#123;</span><br><span class="line">            appAd.setContentType(flag + &quot;&quot;);</span><br><span class="line">            flag &#x3D; rand.nextInt(6);</span><br><span class="line">            appAd.setItemId(flag + &quot;&quot;);</span><br><span class="line">        &#125; else &#123;</span><br><span class="line">            appAd.setContentType(flag + &quot;&quot;);</span><br><span class="line">            flag &#x3D; rand.nextInt(1) + 1;</span><br><span class="line">            appAd.setActivityId(flag + &quot;&quot;);</span><br><span class="line">        &#125;</span><br><span class="line">        JSONObject jsonObject &#x3D; (JSONObject) JSON.toJSON(appAd);</span><br><span class="line">        return packEventJson(&quot;ad&quot;, jsonObject);</span><br><span class="line">    &#125;</span><br><span class="line">    &#x2F;**</span><br><span class="line">     * 启动日志</span><br><span class="line">     *&#x2F;</span><br><span class="line">    private static AppStart generateStart() &#123;</span><br><span class="line">        AppStart appStart &#x3D; new AppStart();</span><br><span class="line">        &#x2F;&#x2F;设备id</span><br><span class="line">        appStart.setMid(s_mid + &quot;&quot;);</span><br><span class="line">        s_mid++;</span><br><span class="line">        &#x2F;&#x2F; 用户id</span><br><span class="line">        appStart.setUid(s_uid + &quot;&quot;);</span><br><span class="line">        s_uid++;</span><br><span class="line">        &#x2F;&#x2F; 程序版本号 5,6等</span><br><span class="line">        appStart.setVc(&quot;&quot; + rand.nextInt(20));</span><br><span class="line">        &#x2F;&#x2F;程序版本名 v1.1.1</span><br><span class="line">        appStart.setVn(&quot;1.&quot; + rand.nextInt(4) + &quot;.&quot; + rand.nextInt(10));</span><br><span class="line">        &#x2F;&#x2F; 安卓系统版本</span><br><span class="line">        appStart.setOs(&quot;8.&quot; + rand.nextInt(3) + &quot;.&quot; + rand.nextInt(10));</span><br><span class="line">        &#x2F;&#x2F;设置日志类型</span><br><span class="line">        appStart.setEn(&quot;start&quot;);</span><br><span class="line">        &#x2F;&#x2F;    语言  es,en,pt</span><br><span class="line">        int flag &#x3D; rand.nextInt(3);</span><br><span class="line">        switch (flag) &#123;</span><br><span class="line">            case (0):</span><br><span class="line">                appStart.setL(&quot;es&quot;);</span><br><span class="line">                break;</span><br><span class="line">            case (1):</span><br><span class="line">                appStart.setL(&quot;en&quot;);</span><br><span class="line">                break;</span><br><span class="line">            case (2):</span><br><span class="line">                appStart.setL(&quot;pt&quot;);</span><br><span class="line">                break;</span><br><span class="line">        &#125;</span><br><span class="line">        &#x2F;&#x2F; 渠道号   从哪个渠道来的</span><br><span class="line">        appStart.setSr(getRandomChar(1));</span><br><span class="line">        &#x2F;&#x2F; 区域</span><br><span class="line">        flag &#x3D; rand.nextInt(2);</span><br><span class="line">        switch (flag) &#123;</span><br><span class="line">            case 0:</span><br><span class="line">                appStart.setAr(&quot;BR&quot;);</span><br><span class="line">            case 1:</span><br><span class="line">                appStart.setAr(&quot;MX&quot;);</span><br><span class="line">        &#125;</span><br><span class="line">        &#x2F;&#x2F; 手机品牌 ba ,手机型号 md，就取2位数字了</span><br><span class="line">        flag &#x3D; rand.nextInt(3);</span><br><span class="line">        switch (flag) &#123;</span><br><span class="line">            case 0:</span><br><span class="line">                appStart.setBa(&quot;Sumsung&quot;);</span><br><span class="line">                appStart.setMd(&quot;sumsung-&quot; + rand.nextInt(20));</span><br><span class="line">                break;</span><br><span class="line">            case 1:</span><br><span class="line">                appStart.setBa(&quot;Huawei&quot;);</span><br><span class="line">                appStart.setMd(&quot;Huawei-&quot; + rand.nextInt(20));</span><br><span class="line">                break;</span><br><span class="line">            case 2:</span><br><span class="line">                appStart.setBa(&quot;HTC&quot;);</span><br><span class="line">                appStart.setMd(&quot;HTC-&quot; + rand.nextInt(20));</span><br><span class="line">                break;</span><br><span class="line">        &#125;</span><br><span class="line">        &#x2F;&#x2F; 嵌入sdk的版本</span><br><span class="line">        appStart.setSv(&quot;V2.&quot; + rand.nextInt(10) + &quot;.&quot; + rand.nextInt(10));</span><br><span class="line">        &#x2F;&#x2F; gmail</span><br><span class="line">        appStart.setG(getRandomCharAndNumr(8) + &quot;@gmail.com&quot;);</span><br><span class="line">        &#x2F;&#x2F; 屏幕宽高 hw</span><br><span class="line">        flag &#x3D; rand.nextInt(4);</span><br><span class="line">        switch (flag) &#123;</span><br><span class="line">            case 0:</span><br><span class="line">                appStart.setHw(&quot;640*960&quot;);</span><br><span class="line">                break;</span><br><span class="line">            case 1:</span><br><span class="line">                appStart.setHw(&quot;640*1136&quot;);</span><br><span class="line">                break;</span><br><span class="line">            case 2:</span><br><span class="line">                appStart.setHw(&quot;750*1134&quot;);</span><br><span class="line">                break;</span><br><span class="line">            case 3:</span><br><span class="line">                appStart.setHw(&quot;1080*1920&quot;);</span><br><span class="line">                break;</span><br><span class="line">        &#125;</span><br><span class="line">        &#x2F;&#x2F; 客户端产生日志时间</span><br><span class="line">        long millis &#x3D; System.currentTimeMillis();</span><br><span class="line">        appStart.setT(&quot;&quot; + (millis - rand.nextInt(99999999)));</span><br><span class="line">        &#x2F;&#x2F; 手机网络模式 3G,4G,WIFI</span><br><span class="line">        flag &#x3D; rand.nextInt(3);</span><br><span class="line">        switch (flag) &#123;</span><br><span class="line">            case 0:</span><br><span class="line">                appStart.setNw(&quot;3G&quot;);</span><br><span class="line">                break;</span><br><span class="line">            case 1:</span><br><span class="line">                appStart.setNw(&quot;4G&quot;);</span><br><span class="line">                break;</span><br><span class="line">            case 2:</span><br><span class="line">                appStart.setNw(&quot;WIFI&quot;);</span><br><span class="line">                break;</span><br><span class="line">        &#125;</span><br><span class="line">        &#x2F;&#x2F; 拉丁美洲 西经34°46′至西经117°09；北纬32°42′至南纬53°54′</span><br><span class="line">        &#x2F;&#x2F; 经度</span><br><span class="line">        appStart.setLn((-34 - rand.nextInt(83) - rand.nextInt(60) &#x2F; 10.0) + &quot;&quot;);</span><br><span class="line">        &#x2F;&#x2F; 纬度</span><br><span class="line">        appStart.setLa((32 - rand.nextInt(85) - rand.nextInt(60) &#x2F; 10.0) + &quot;&quot;);</span><br><span class="line">        &#x2F;&#x2F; 入口</span><br><span class="line">        flag &#x3D; rand.nextInt(5) + 1;</span><br><span class="line">        appStart.setEntry(flag + &quot;&quot;);</span><br><span class="line">        &#x2F;&#x2F; 开屏广告类型</span><br><span class="line">        flag &#x3D; rand.nextInt(2) + 1;</span><br><span class="line">        appStart.setOpen_ad_type(flag + &quot;&quot;);</span><br><span class="line">        &#x2F;&#x2F; 状态</span><br><span class="line">        flag &#x3D; rand.nextInt(10) &gt; 8 ? 2 : 1;</span><br><span class="line">        appStart.setAction(flag + &quot;&quot;);</span><br><span class="line">        &#x2F;&#x2F; 加载时长</span><br><span class="line">        appStart.setLoading_time(rand.nextInt(20) + &quot;&quot;);</span><br><span class="line">        &#x2F;&#x2F; 失败码</span><br><span class="line">        flag &#x3D; rand.nextInt(10);</span><br><span class="line">        switch (flag) &#123;</span><br><span class="line">            case 1:</span><br><span class="line">                appStart.setDetail(&quot;102&quot;);</span><br><span class="line">                break;</span><br><span class="line">            case 2:</span><br><span class="line">                appStart.setDetail(&quot;201&quot;);</span><br><span class="line">                break;</span><br><span class="line">            case 3:</span><br><span class="line">                appStart.setDetail(&quot;325&quot;);</span><br><span class="line">                break;</span><br><span class="line">            case 4:</span><br><span class="line">                appStart.setDetail(&quot;433&quot;);</span><br><span class="line">                break;</span><br><span class="line">            case 5:</span><br><span class="line">                appStart.setDetail(&quot;542&quot;);</span><br><span class="line">                break;</span><br><span class="line">            default:</span><br><span class="line">                appStart.setDetail(&quot;&quot;);</span><br><span class="line">                break;</span><br><span class="line">        &#125;</span><br><span class="line">        &#x2F;&#x2F; 扩展字段</span><br><span class="line">        appStart.setExtend1(&quot;&quot;);</span><br><span class="line">        return appStart;</span><br><span class="line">    &#125;</span><br><span class="line">    &#x2F;**</span><br><span class="line">     * 消息通知</span><br><span class="line">     *&#x2F;</span><br><span class="line">    private static JSONObject generateNotification() &#123;</span><br><span class="line">        AppNotification appNotification &#x3D; new AppNotification();</span><br><span class="line">        int flag &#x3D; rand.nextInt(4) + 1;</span><br><span class="line">        &#x2F;&#x2F; 动作</span><br><span class="line">        appNotification.setAction(flag + &quot;&quot;);</span><br><span class="line">        &#x2F;&#x2F; 通知id</span><br><span class="line">        flag &#x3D; rand.nextInt(4) + 1;</span><br><span class="line">        appNotification.setType(flag + &quot;&quot;);</span><br><span class="line">        &#x2F;&#x2F; 客户端弹时间</span><br><span class="line">        appNotification.setAp_time((System.currentTimeMillis() - rand.nextInt(99999999)) + &quot;&quot;);</span><br><span class="line">        &#x2F;&#x2F; 备用字段</span><br><span class="line">        appNotification.setContent(&quot;&quot;);</span><br><span class="line">        JSONObject jsonObject &#x3D; (JSONObject) JSON.toJSON(appNotification);</span><br><span class="line">        return packEventJson(&quot;notification&quot;, jsonObject);</span><br><span class="line">    &#125;</span><br><span class="line">    &#x2F;**</span><br><span class="line">     * 后台活跃</span><br><span class="line">     *&#x2F;</span><br><span class="line">    private static JSONObject generateBackground() &#123;</span><br><span class="line">        AppActive appActive_background &#x3D; new AppActive();</span><br><span class="line">        &#x2F;&#x2F; 启动源</span><br><span class="line">        int flag &#x3D; rand.nextInt(3) + 1;</span><br><span class="line">        appActive_background.setActive_source(flag + &quot;&quot;);</span><br><span class="line">        JSONObject jsonObject &#x3D; (JSONObject) JSON.toJSON(appActive_background);</span><br><span class="line">        return packEventJson(&quot;active_background&quot;, jsonObject);</span><br><span class="line">    &#125;</span><br><span class="line">    &#x2F;**</span><br><span class="line">     * 错误日志数据</span><br><span class="line">     *&#x2F;</span><br><span class="line">    private static JSONObject generateError() &#123;</span><br><span class="line">        AppErrorLog appErrorLog &#x3D; new AppErrorLog();</span><br><span class="line">        String[] errorBriefs &#x3D; &#123;&quot;at cn.lift.dfdf.web.AbstractBaseController.validInbound(AbstractBaseController.java:72)&quot;, &quot;at cn.lift.appIn.control.CommandUtil.getInfo(CommandUtil.java:67)&quot;&#125;;        &#x2F;&#x2F;错误摘要</span><br><span class="line">        String[] errorDetails &#x3D; &#123;&quot;java.lang.NullPointerException\\n    &quot; + &quot;at cn.lift.appIn.web.AbstractBaseController.validInbound(AbstractBaseController.java:72)\\n &quot; + &quot;at cn.lift.dfdf.web.AbstractBaseController.validInbound&quot;, &quot;at cn.lift.dfdfdf.control.CommandUtil.getInfo(CommandUtil.java:67)\\n &quot; + &quot;at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\\n&quot; + &quot; at java.lang.reflect.Method.invoke(Method.java:606)\\n&quot;&#125;;        &#x2F;&#x2F;错误详情</span><br><span class="line">        &#x2F;&#x2F;错误摘要</span><br><span class="line">        appErrorLog.setErrorBrief(errorBriefs[rand.nextInt(errorBriefs.length)]);</span><br><span class="line">        &#x2F;&#x2F;错误详情</span><br><span class="line">        appErrorLog.setErrorDetail(errorDetails[rand.nextInt(errorDetails.length)]);</span><br><span class="line">        JSONObject jsonObject &#x3D; (JSONObject) JSON.toJSON(appErrorLog);</span><br><span class="line">        return packEventJson(&quot;error&quot;, jsonObject);</span><br><span class="line">    &#125;</span><br><span class="line">    &#x2F;**</span><br><span class="line">     * 为各个事件类型的公共字段（时间、事件类型、Json数据）拼接</span><br><span class="line">     *&#x2F;</span><br><span class="line">    private static JSONObject packEventJson(String eventName, JSONObject jsonObject) &#123;</span><br><span class="line">        JSONObject eventJson &#x3D; new JSONObject();</span><br><span class="line">        eventJson.put(&quot;ett&quot;, (System.currentTimeMillis() - rand.nextInt(99999999)) + &quot;&quot;);</span><br><span class="line">        eventJson.put(&quot;en&quot;, eventName);</span><br><span class="line">        eventJson.put(&quot;kv&quot;, jsonObject);</span><br><span class="line">        return eventJson;</span><br><span class="line">    &#125;</span><br><span class="line">    &#x2F;**</span><br><span class="line">     * 获取随机字母组合</span><br><span class="line">     *</span><br><span class="line">     * @param length 字符串长度</span><br><span class="line">     *&#x2F;</span><br><span class="line">    private static String getRandomChar(Integer length) &#123;</span><br><span class="line">        StringBuilder str &#x3D; new StringBuilder();</span><br><span class="line">        Random random &#x3D; new Random();</span><br><span class="line">        for (int i &#x3D; 0; i &lt; length; i++) &#123;</span><br><span class="line">            &#x2F;&#x2F; 字符串</span><br><span class="line">            str.append((char) (65 + random.nextInt(26)));&#x2F;&#x2F; 取得大写字母</span><br><span class="line">        &#125;</span><br><span class="line">        return str.toString();</span><br><span class="line">    &#125;</span><br><span class="line">    &#x2F;**</span><br><span class="line">     * 获取随机字母数字组合</span><br><span class="line">     *</span><br><span class="line">     * @param length 字符串长度</span><br><span class="line">     *&#x2F;</span><br><span class="line">    private static String getRandomCharAndNumr(Integer length) &#123;</span><br><span class="line">        StringBuilder str &#x3D; new StringBuilder();</span><br><span class="line">        Random random &#x3D; new Random();</span><br><span class="line">        for (int i &#x3D; 0; i &lt; length; i++) &#123;</span><br><span class="line">            boolean b &#x3D; random.nextBoolean();</span><br><span class="line">            if (b) &#123; &#x2F;&#x2F; 字符串</span><br><span class="line">                &#x2F;&#x2F; int choice &#x3D; random.nextBoolean() ? 65 : 97; 取得65大写字母还是97小写字母</span><br><span class="line">                str.append((char) (65 + random.nextInt(26)));&#x2F;&#x2F; 取得大写字母</span><br><span class="line">            &#125; else &#123; &#x2F;&#x2F; 数字</span><br><span class="line">                str.append(String.valueOf(random.nextInt(10)));</span><br><span class="line">            &#125;</span><br><span class="line">        &#125;</span><br><span class="line">        return str.toString();</span><br><span class="line">    &#125;</span><br><span class="line">    &#x2F;**</span><br><span class="line">     * 收藏</span><br><span class="line">     *&#x2F;</span><br><span class="line">    private static JSONObject generateFavorites() &#123;</span><br><span class="line">        AppFavorites favorites &#x3D; new AppFavorites();</span><br><span class="line">        favorites.setCourse_id(rand.nextInt(10));</span><br><span class="line">        favorites.setUserid(rand.nextInt(10));</span><br><span class="line">        favorites.setAdd_time((System.currentTimeMillis() - rand.nextInt(99999999)) + &quot;&quot;);</span><br><span class="line">        JSONObject jsonObject &#x3D; (JSONObject) JSON.toJSON(favorites);</span><br><span class="line">        return packEventJson(&quot;favorites&quot;, jsonObject);</span><br><span class="line">    &#125;</span><br><span class="line">    &#x2F;**</span><br><span class="line">     * 点赞</span><br><span class="line">     *&#x2F;</span><br><span class="line">    private static JSONObject generatePraise() &#123;</span><br><span class="line">        AppPraise praise &#x3D; new AppPraise();</span><br><span class="line">        praise.setId(rand.nextInt(10));</span><br><span class="line">        praise.setUserid(rand.nextInt(10));</span><br><span class="line">        praise.setTarget_id(rand.nextInt(10));</span><br><span class="line">        praise.setType(rand.nextInt(4) + 1);</span><br><span class="line">        praise.setAdd_time((System.currentTimeMillis() - rand.nextInt(99999999)) + &quot;&quot;);</span><br><span class="line">        JSONObject jsonObject &#x3D; (JSONObject) JSON.toJSON(praise);</span><br><span class="line">        return packEventJson(&quot;praise&quot;, jsonObject);</span><br><span class="line">    &#125;</span><br><span class="line">    &#x2F;**</span><br><span class="line">     * 评论</span><br><span class="line">     *&#x2F;</span><br><span class="line">    private static JSONObject generateComment() &#123;</span><br><span class="line">        AppComment comment &#x3D; new AppComment();</span><br><span class="line">        comment.setComment_id(rand.nextInt(10));</span><br><span class="line">        comment.setUserid(rand.nextInt(10));</span><br><span class="line">        comment.setP_comment_id(rand.nextInt(5));</span><br><span class="line">        comment.setContent(getCONTENT());</span><br><span class="line">        comment.setAddtime((System.currentTimeMillis() - rand.nextInt(99999999)) + &quot;&quot;);</span><br><span class="line">        comment.setOther_id(rand.nextInt(10));</span><br><span class="line">        comment.setPraise_count(rand.nextInt(1000));</span><br><span class="line">        comment.setReply_count(rand.nextInt(200));</span><br><span class="line">        JSONObject jsonObject &#x3D; (JSONObject) JSON.toJSON(comment);</span><br><span class="line">        return packEventJson(&quot;comment&quot;, jsonObject);</span><br><span class="line">    &#125;</span><br><span class="line">    &#x2F;**</span><br><span class="line">     * 生成单个汉字</span><br><span class="line">     *&#x2F;</span><br><span class="line">    private static char getRandomChar() &#123;</span><br><span class="line">        String str &#x3D; &quot;&quot;;</span><br><span class="line">        int hightPos; &#x2F;&#x2F;</span><br><span class="line">        int lowPos;</span><br><span class="line">        Random random &#x3D; new Random();</span><br><span class="line">        &#x2F;&#x2F;随机生成汉子的两个字节</span><br><span class="line">        hightPos &#x3D; (176 + Math.abs(random.nextInt(39)));</span><br><span class="line">        lowPos &#x3D; (161 + Math.abs(random.nextInt(93)));</span><br><span class="line">        byte[] b &#x3D; new byte[2];</span><br><span class="line">        b[0] &#x3D; (Integer.valueOf(hightPos)).byteValue();</span><br><span class="line">        b[1] &#x3D; (Integer.valueOf(lowPos)).byteValue();</span><br><span class="line">        try &#123;</span><br><span class="line">            str &#x3D; new String(b, &quot;GBK&quot;);</span><br><span class="line">        &#125; catch (UnsupportedEncodingException e) &#123;</span><br><span class="line">            e.printStackTrace();</span><br><span class="line">            System.out.println(&quot;错误&quot;);</span><br><span class="line">        &#125;</span><br><span class="line">        return str.charAt(0);</span><br><span class="line">    &#125;</span><br><span class="line">    &#x2F;**</span><br><span class="line">     * 拼接成多个汉字</span><br><span class="line">     *&#x2F;</span><br><span class="line">    private static String getCONTENT() &#123;</span><br><span class="line">        StringBuilder str &#x3D; new StringBuilder();</span><br><span class="line">        for (int i &#x3D; 0; i &lt; rand.nextInt(100); i++) &#123;</span><br><span class="line">            str.append(getRandomChar());</span><br><span class="line">        &#125;</span><br><span class="line">        return str.toString();</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><a name="d6c3b682"></a></p><h4 id="2-4-5-启动测试"><a href="#2-4-5-启动测试" class="headerlink" title="2.4.5 启动测试"></a>2.4.5 启动测试</h4><blockquote><p>注意，需要将日志模拟放到2台服务器上，模拟日志每一条中即包括公共日志，又包含事件日志，需要flume拦截器进行日志分发，当然也需要两个flume-ng来做这个事情<br>打包上传2台服务器节点，生产数据为后面的测试做准备，这里为用户目录test文件夹下</p></blockquote><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445231-a9463aec-bcbe-4716-9e21-e761f37fc57e.png#align=left&display=inline&height=64&margin=%5Bobject%20Object%5D&originHeight=64&originWidth=1152&size=0&status=done&style=none&width=1152" alt><br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445356-4fb1c879-70b3-4a9b-aecd-98bc3096d13d.png#align=left&display=inline&height=61&margin=%5Bobject%20Object%5D&originHeight=61&originWidth=1150&size=0&status=done&style=none&width=1150" alt></p><blockquote><p>通过参数控制生成消息速度及产量(如下 2秒一条，打印1000条)</p></blockquote><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">#控制时间及条数</span><br><span class="line">nohup java -jar data-producer-1.0-SNAPSHOT-jar-with-dependencies.jar 2000 1000 &amp;</span><br><span class="line">#监控日志</span><br><span class="line">tail -F &#x2F;root&#x2F;logs&#x2F;*.log</span><br></pre></td></tr></table></figure><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445278-2645af1d-a1c6-49eb-9fae-9a2e54c6fdd1.png#align=left&display=inline&height=125&margin=%5Bobject%20Object%5D&originHeight=125&originWidth=1204&size=0&status=done&style=none&width=1204" alt></p><blockquote><p>通过<a href="https://www.json.cn/" target="_blank" rel="external nofollow noopener noreferrer">www.json.cn</a>查看数据格式</p></blockquote><p><a name="blogTitle10"></a></p><h2 id="3-创建KafKa-Topic"><a href="#3-创建KafKa-Topic" class="headerlink" title="3 创建KafKa-Topic"></a>3 创建KafKa-Topic</h2><ul><li>创建启动日志主题：topic_start</li><li>创建事件日志主题：topic_event</li></ul><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445291-68eeb8c7-d2d0-4420-b26b-7008b185d7d4.png#align=left&display=inline&height=278&margin=%5Bobject%20Object%5D&originHeight=278&originWidth=918&size=0&status=done&style=none&width=918" alt><br><a name="blogTitle11"></a></p><h2 id="4-Flume准备"><a href="#4-Flume准备" class="headerlink" title="4 Flume准备"></a>4 Flume准备</h2><blockquote><p>共分为2组flume<br>第一组：将服务器日志收集，并使用Kafka-Channels将数据发往Kafka不同的Topic，其中使用拦截器进行公共日志和事件日志的分发，<br>第二组：收集Kafka数据，使用Flie-Channels缓存数据，最终发往Hdfs保存</p></blockquote><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445578-b0c40759-4e33-4b63-899b-6757a21dca2b.png#align=left&display=inline&height=821&margin=%5Bobject%20Object%5D&originHeight=821&originWidth=1920&size=0&status=done&style=none&width=1920" alt><br><a name="blogTitle12"></a></p><h3 id="4-1-Flume：File-gt-Kafka配置编写"><a href="#4-1-Flume：File-gt-Kafka配置编写" class="headerlink" title="4.1 Flume：File-&gt;Kafka配置编写"></a>4.1 Flume：File-&gt;Kafka配置编写</h3><ul><li>vim /root/test/file-flume-kafka.conf<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line">#1 定义组件</span><br><span class="line">a1.sources&#x3D;r1</span><br><span class="line">a1.channels&#x3D;c1 c2</span><br><span class="line"># 2 source配置 type类型 positionFile记录日志读取位置 filegroups读取哪些目录  app.+为读取什么开头 channels发往哪里</span><br><span class="line">a1.sources.r1.type &#x3D; TAILDIR</span><br><span class="line">a1.sources.r1.positionFile &#x3D; &#x2F;root&#x2F;test&#x2F;flume&#x2F;log_position.json</span><br><span class="line">a1.sources.r1.filegroups &#x3D; f1</span><br><span class="line">a1.sources.r1.filegroups.f1 &#x3D; &#x2F;root&#x2F;logs&#x2F;app.+</span><br><span class="line">a1.sources.r1.fileHeader &#x3D; true</span><br><span class="line">a1.sources.r1.channels &#x3D; c1 c2</span><br><span class="line">#3 拦截器 这里2个为自定义的拦截器 multiplexing为类型区分选择器 header头用于区分类型 mapping匹配头</span><br><span class="line">a1.sources.r1.interceptors &#x3D; i1 i2</span><br><span class="line">a1.sources.r1.interceptors.i1.type &#x3D; com.heaton.bigdata.flume.LogETLInterceptor$Builder</span><br><span class="line">a1.sources.r1.interceptors.i2.type &#x3D; com.heaton.bigdata.flume.LogTypeInterceptor$Builder</span><br><span class="line">a1.sources.r1.selector.type &#x3D; multiplexing</span><br><span class="line">a1.sources.r1.selector.header &#x3D; topic</span><br><span class="line">a1.sources.r1.selector.mapping.topic_start &#x3D; c1</span><br><span class="line">a1.sources.r1.selector.mapping.topic_event &#x3D; c2</span><br><span class="line">#4 channel配置 kafkaChannel</span><br><span class="line">a1.channels.c1.type &#x3D; org.apache.flume.channel.kafka.KafkaChannel</span><br><span class="line">a1.channels.c1.kafka.bootstrap.servers &#x3D; cdh01.cm:9092,cdh02.cm:9092,cdh03.cm:9092</span><br><span class="line">a1.channels.c1.kafka.topic &#x3D; topic_start</span><br><span class="line">a1.channels.c1.parseAsFlumeEvent &#x3D; false</span><br><span class="line">a1.channels.c1.kafka.consumer.group.id &#x3D; flume-consumer</span><br><span class="line">a1.channels.c2.type &#x3D;org.apache.flume.channel.kafka.KafkaChannel</span><br><span class="line">a1.channels.c2.kafka.bootstrap.servers &#x3D; cdh01.cm:9092,cdh02.cm:9092,cdh03.cm:9092</span><br><span class="line">a1.channels.c2.kafka.topic &#x3D; topic_event</span><br><span class="line">a1.channels.c2.parseAsFlumeEvent &#x3D; false</span><br><span class="line">a1.channels.c2.kafka.consumer.group.id &#x3D; flume-consumer</span><br></pre></td></tr></table></figure><blockquote><p>在生产日志的2台服务器节点上创建flume配置文件。<br>LogETLInterceptor，LogTypeInterceptor为自定义拦截</p></blockquote></li></ul><p><a name="blogTitle13"></a></p><h3 id="4-2-自定义拦截器"><a href="#4-2-自定义拦截器" class="headerlink" title="4.2 自定义拦截器"></a>4.2 自定义拦截器</h3><blockquote><p>data-flume工程</p></blockquote><ul><li><p>LogUtils</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br></pre></td><td class="code"><pre><span class="line">import org.apache.commons.lang.math.NumberUtils;</span><br><span class="line">public class LogUtils &#123;</span><br><span class="line">    public static boolean validateEvent(String log) &#123;</span><br><span class="line">        &#x2F;** 服务器时间 | json</span><br><span class="line">         1588319303710|&#123;</span><br><span class="line">         &quot;cm&quot;:&#123;</span><br><span class="line">         &quot;ln&quot;:&quot;-51.5&quot;,&quot;sv&quot;:&quot;V2.0.7&quot;,&quot;os&quot;:&quot;8.0.8&quot;,&quot;g&quot;:&quot;L1470998@gmail.com&quot;,&quot;mid&quot;:&quot;13&quot;,</span><br><span class="line">         &quot;nw&quot;:&quot;4G&quot;,&quot;l&quot;:&quot;en&quot;,&quot;vc&quot;:&quot;7&quot;,&quot;hw&quot;:&quot;640*960&quot;,&quot;ar&quot;:&quot;MX&quot;,&quot;uid&quot;:&quot;13&quot;,&quot;t&quot;:&quot;1588291826938&quot;,</span><br><span class="line">         &quot;la&quot;:&quot;-38.2&quot;,&quot;md&quot;:&quot;Huawei-14&quot;,&quot;vn&quot;:&quot;1.3.6&quot;,&quot;ba&quot;:&quot;Huawei&quot;,&quot;sr&quot;:&quot;Y&quot;</span><br><span class="line">         &#125;,</span><br><span class="line">         &quot;ap&quot;:&quot;app&quot;,</span><br><span class="line">         &quot;et&quot;:[&#123;</span><br><span class="line">                &quot;ett&quot;:&quot;1588228193191&quot;,&quot;en&quot;:&quot;ad&quot;,&quot;kv&quot;:&#123;&quot;activityId&quot;:&quot;1&quot;,&quot;displayMills&quot;:&quot;113201&quot;,&quot;entry&quot;:&quot;3&quot;,&quot;action&quot;:&quot;5&quot;,&quot;contentType&quot;:&quot;0&quot;&#125;</span><br><span class="line">                &#125;,&#123;</span><br><span class="line">                &quot;ett&quot;:&quot;1588300304713&quot;,&quot;en&quot;:&quot;notification&quot;,&quot;kv&quot;:&#123;&quot;ap_time&quot;:&quot;1588277440794&quot;,&quot;action&quot;:&quot;2&quot;,&quot;type&quot;:&quot;3&quot;,&quot;content&quot;:&quot;&quot;&#125;</span><br><span class="line">                &#125;,&#123;</span><br><span class="line">                &quot;ett&quot;:&quot;1588249203743&quot;,&quot;en&quot;:&quot;active_background&quot;,&quot;kv&quot;:&#123;&quot;active_source&quot;:&quot;3&quot;&#125;</span><br><span class="line">                &#125;,&#123;</span><br><span class="line">                &quot;ett&quot;:&quot;1588254200122&quot;,&quot;en&quot;:&quot;favorites&quot;,&quot;kv&quot;:&#123;&quot;course_id&quot;:5,&quot;id&quot;:0,&quot;add_time&quot;:&quot;1588264138625&quot;,&quot;userid&quot;:0&#125;</span><br><span class="line">                &#125;,&#123;</span><br><span class="line">                &quot;ett&quot;:&quot;1588281152824&quot;,&quot;en&quot;:&quot;praise&quot;,&quot;kv&quot;:&#123;&quot;target_id&quot;:4,&quot;id&quot;:3,&quot;type&quot;:3,&quot;add_time&quot;:&quot;1588307696417&quot;,&quot;userid&quot;:8&#125;</span><br><span class="line">                &#125;]</span><br><span class="line">         &#125;</span><br><span class="line">         *&#x2F;</span><br><span class="line">        &#x2F;&#x2F; 1 切割</span><br><span class="line">        String[] logContents &#x3D; log.split(&quot;\\|&quot;);</span><br><span class="line">        &#x2F;&#x2F; 2 校验</span><br><span class="line">        if (logContents.length !&#x3D; 2) &#123;</span><br><span class="line">            return false;</span><br><span class="line">        &#125;</span><br><span class="line">        &#x2F;&#x2F;3 校验服务器时间</span><br><span class="line">        if (logContents[0].length() !&#x3D; 13 || !NumberUtils.isDigits(logContents[0])) &#123;</span><br><span class="line">            return false;</span><br><span class="line">        &#125;</span><br><span class="line">        &#x2F;&#x2F; 4 校验 json</span><br><span class="line">        if (!logContents[1].trim().startsWith(&quot;&#123;&quot;)</span><br><span class="line">                || !logContents[1].trim().endsWith(&quot;&#125;&quot;)) &#123;</span><br><span class="line">            return false;</span><br><span class="line">        &#125;</span><br><span class="line">        return true;</span><br><span class="line">    &#125;</span><br><span class="line">    public static boolean validateStart(String log) &#123;</span><br><span class="line">        &#x2F;**</span><br><span class="line">         &#123;</span><br><span class="line">         &quot;action&quot;:&quot;1&quot;,&quot;ar&quot;:&quot;MX&quot;,&quot;ba&quot;:&quot;HTC&quot;,&quot;detail&quot;:&quot;201&quot;,&quot;en&quot;:&quot;start&quot;,&quot;entry&quot;:&quot;4&quot;,&quot;extend1&quot;:&quot;&quot;,</span><br><span class="line">         &quot;g&quot;:&quot;4Z174142@gmail.com&quot;,&quot;hw&quot;:&quot;750*1134&quot;,&quot;l&quot;:&quot;pt&quot;,&quot;la&quot;:&quot;-29.7&quot;,&quot;ln&quot;:&quot;-48.1&quot;,&quot;loading_time&quot;:&quot;0&quot;,</span><br><span class="line">         &quot;md&quot;:&quot;HTC-18&quot;,&quot;mid&quot;:&quot;14&quot;,&quot;nw&quot;:&quot;3G&quot;,&quot;open_ad_type&quot;:&quot;2&quot;,&quot;os&quot;:&quot;8.0.8&quot;,&quot;sr&quot;:&quot;D&quot;,&quot;sv&quot;:&quot;V2.8.2&quot;,</span><br><span class="line">         &quot;t&quot;:&quot;1588251833523&quot;,&quot;uid&quot;:&quot;14&quot;,&quot;vc&quot;:&quot;15&quot;,&quot;vn&quot;:&quot;1.2.9&quot;</span><br><span class="line">         &#125;</span><br><span class="line">        *&#x2F;</span><br><span class="line">        if (log &#x3D;&#x3D; null) &#123;</span><br><span class="line">            return false;</span><br><span class="line">        &#125;</span><br><span class="line">        &#x2F;&#x2F; 校验 json</span><br><span class="line">        if (!log.trim().startsWith(&quot;&#123;&quot;) || !log.trim().endsWith(&quot;&#125;&quot;)) &#123;</span><br><span class="line">            return false;</span><br><span class="line">        &#125;</span><br><span class="line">        return true;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></li><li><p>LogETLInterceptor</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br></pre></td><td class="code"><pre><span class="line">import org.apache.flume.Context;</span><br><span class="line">import org.apache.flume.Event;</span><br><span class="line">import org.apache.flume.interceptor.Interceptor;</span><br><span class="line">import java.nio.charset.Charset;</span><br><span class="line">import java.util.ArrayList;</span><br><span class="line">import java.util.List;</span><br><span class="line">public class LogETLInterceptor implements Interceptor &#123;</span><br><span class="line">    @Override</span><br><span class="line">    public void initialize() &#123;</span><br><span class="line">    &#x2F;&#x2F;初始化</span><br><span class="line">    &#125;</span><br><span class="line">    @Override</span><br><span class="line">    public Event intercept(Event event) &#123;</span><br><span class="line">        &#x2F;&#x2F; 1 获取数据</span><br><span class="line">        byte[] body &#x3D; event.getBody();</span><br><span class="line">        String log &#x3D; new String(body, Charset.forName(&quot;UTF-8&quot;));</span><br><span class="line">        &#x2F;&#x2F; 2 判断数据类型并向 Header 中赋值</span><br><span class="line">        if (log.contains(&quot;start&quot;)) &#123;</span><br><span class="line">            if (LogUtils.validateStart(log)) &#123;</span><br><span class="line">                return event;</span><br><span class="line">            &#125;</span><br><span class="line">        &#125; else &#123;</span><br><span class="line">            if (LogUtils.validateEvent(log)) &#123;</span><br><span class="line">                return event;</span><br><span class="line">            &#125;</span><br><span class="line">        &#125;</span><br><span class="line">        &#x2F;&#x2F; 3 返回校验结果</span><br><span class="line">        return null;</span><br><span class="line">    &#125;</span><br><span class="line">    @Override</span><br><span class="line">    public List&lt;Event&gt; intercept(List&lt;Event&gt; events) &#123;</span><br><span class="line">        ArrayList&lt;Event&gt; interceptors &#x3D; new ArrayList&lt;&gt;();</span><br><span class="line">        for (Event event : events) &#123;</span><br><span class="line">            Event intercept1 &#x3D; intercept(event);</span><br><span class="line">            if (intercept1 !&#x3D; null) &#123;</span><br><span class="line">                interceptors.add(intercept1);</span><br><span class="line">            &#125;</span><br><span class="line">        &#125;</span><br><span class="line">        return interceptors;</span><br><span class="line">    &#125;</span><br><span class="line">    @Override</span><br><span class="line">    public void close() &#123;</span><br><span class="line">    &#x2F;&#x2F;关闭</span><br><span class="line">    &#125;</span><br><span class="line">    public static class Builder implements Interceptor.Builder &#123;</span><br><span class="line">        @Override</span><br><span class="line">        public Interceptor build() &#123;</span><br><span class="line">            return new LogETLInterceptor();</span><br><span class="line">        &#125;</span><br><span class="line">        @Override</span><br><span class="line">        public void configure(Context context) &#123;</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></li><li><p>LogTypeInterceptor</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br></pre></td><td class="code"><pre><span class="line">import org.apache.flume.Context;</span><br><span class="line">import org.apache.flume.Event;</span><br><span class="line">import org.apache.flume.interceptor.Interceptor;</span><br><span class="line">import java.nio.charset.Charset;</span><br><span class="line">import java.util.ArrayList;</span><br><span class="line">import java.util.List;</span><br><span class="line">import java.util.Map;</span><br><span class="line">public class LogTypeInterceptor implements Interceptor &#123;</span><br><span class="line">    @Override</span><br><span class="line">    public void initialize() &#123;</span><br><span class="line">    &#125;</span><br><span class="line">    @Override</span><br><span class="line">    public Event intercept(Event event) &#123;</span><br><span class="line">        &#x2F;&#x2F; 区分日志类型： body header</span><br><span class="line">        &#x2F;&#x2F; 1 获取 body 数据</span><br><span class="line">        byte[] body &#x3D; event.getBody();</span><br><span class="line">        String log &#x3D; new String(body, Charset.forName(&quot;UTF-8&quot;));</span><br><span class="line">        &#x2F;&#x2F; 2 获取 header</span><br><span class="line">        Map&lt;String, String&gt; headers &#x3D; event.getHeaders();</span><br><span class="line">        &#x2F;&#x2F; 3 判断数据类型并向 Header 中赋值</span><br><span class="line">        if (log.contains(&quot;start&quot;)) &#123;</span><br><span class="line">            headers.put(&quot;topic&quot;, &quot;topic_start&quot;);</span><br><span class="line">        &#125; else &#123;</span><br><span class="line">            headers.put(&quot;topic&quot;, &quot;topic_event&quot;);</span><br><span class="line">        &#125;</span><br><span class="line">        return event;</span><br><span class="line">    &#125;</span><br><span class="line">    @Override</span><br><span class="line">    public List&lt;Event&gt; intercept(List&lt;Event&gt; events) &#123;</span><br><span class="line">        ArrayList&lt;Event&gt; interceptors &#x3D; new ArrayList&lt;&gt;();</span><br><span class="line">        for (Event event : events) &#123;</span><br><span class="line">            Event intercept1 &#x3D; intercept(event);</span><br><span class="line">            interceptors.add(intercept1);</span><br><span class="line">        &#125;</span><br><span class="line">        return interceptors;</span><br><span class="line">    &#125;</span><br><span class="line">    @Override</span><br><span class="line">    public void close() &#123;</span><br><span class="line">    &#125;</span><br><span class="line">    public static class Builder implements Interceptor.Builder &#123;</span><br><span class="line">        @Override</span><br><span class="line">        public Interceptor build() &#123;</span><br><span class="line">            return new LogTypeInterceptor();</span><br><span class="line">        &#125;</span><br><span class="line">        @Override</span><br><span class="line">        public void configure(Context context) &#123;</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><blockquote><p>将项目打包放入Flume/lib目录下(所有节点)：<br>CDH路径参考：/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/lib/flume-ng/lib</p></blockquote></li></ul><p><a name="blogTitle14"></a></p><h3 id="4-3-Flume启停脚本"><a href="#4-3-Flume启停脚本" class="headerlink" title="4.3 Flume启停脚本"></a>4.3 Flume启停脚本</h3><ul><li><p>vim /root/log-kafka-flume.sh</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line">#! &#x2F;bin&#x2F;bash</span><br><span class="line">case $1 in</span><br><span class="line">&quot;start&quot;)&#123;</span><br><span class="line">for i in cdh02.cm cdh03.cm</span><br><span class="line">do</span><br><span class="line">echo &quot; --------启动 $i 消费 flume-------&quot;</span><br><span class="line">ssh $i &quot;nohup flume-ng agent --conf-file &#x2F;root&#x2F;test&#x2F;file-flume-kafka.conf --name a1 -Dflume.root.logger&#x3D;INFO,LOGFILE &gt;&#x2F;root&#x2F;test&#x2F;file-flume-kafka.log 2&gt;&amp;1 &amp;&quot;</span><br><span class="line">done</span><br><span class="line">&#125;;;</span><br><span class="line">&quot;stop&quot;)&#123;</span><br><span class="line">for i in cdh02.cm cdh03.cm</span><br><span class="line">do</span><br><span class="line">echo &quot; --------停止 $i 消费 flume-------&quot;</span><br><span class="line">ssh $i &quot;ps -ef | grep file-flume-kafka | grep -v grep |awk &#39;&#123;print \$2&#125;&#39; | xargs kill&quot;</span><br><span class="line">done</span><br><span class="line">&#125;;;</span><br><span class="line">esac</span><br></pre></td></tr></table></figure><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445286-e3d74a0a-846a-497d-9395-835dc06bbe36.png#align=left&display=inline&height=378&margin=%5Bobject%20Object%5D&originHeight=378&originWidth=1509&size=0&status=done&style=none&width=1509" alt><br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445293-b89b1d01-98f1-4ea2-9a5f-8b49b3bb9549.png#align=left&display=inline&height=385&margin=%5Bobject%20Object%5D&originHeight=385&originWidth=1511&size=0&status=done&style=none&width=1511" alt><br><a name="blogTitle15"></a></p><h3 id="4-4-Flume：Kafka-gt-HDFS配置编写"><a href="#4-4-Flume：Kafka-gt-HDFS配置编写" class="headerlink" title="4.4 Flume：Kafka-&gt;HDFS配置编写"></a>4.4 Flume：Kafka-&gt;HDFS配置编写</h3><blockquote><p>在第三台服务上准备</p></blockquote></li><li><p>vim /root/test/kafka-flume-hdfs.conf</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br></pre></td><td class="code"><pre><span class="line">## 组件 </span><br><span class="line">a1.sources&#x3D;r1 r2</span><br><span class="line">a1.channels&#x3D;c1 c2 </span><br><span class="line">a1.sinks&#x3D;k1 k2</span><br><span class="line">    </span><br><span class="line">## Kafka-source1</span><br><span class="line">a1.sources.r1.type &#x3D; org.apache.flume.source.kafka.KafkaSource</span><br><span class="line">a1.sources.r1.batchSize &#x3D; 5000</span><br><span class="line">a1.sources.r1.batchDurationMillis &#x3D; 2000</span><br><span class="line">a1.sources.r1.kafka.bootstrap.servers&#x3D; cdh01.cm:9092,cdh02.cm:9092,cdh03.cm:9092</span><br><span class="line">a1.sources.r1.kafka.topics &#x3D; topic_start</span><br><span class="line">## Kafka- source2</span><br><span class="line">a1.sources.r2.type &#x3D; org.apache.flume.source.kafka.KafkaSource</span><br><span class="line">a1.sources.r2.batchSize &#x3D; 5000</span><br><span class="line">a1.sources.r2.batchDurationMillis &#x3D; 2000</span><br><span class="line">a1.sources.r2.kafka.bootstrap.servers &#x3D; cdh01.cm:9092,cdh02.cm:9092,cdh03.cm:9092</span><br><span class="line">a1.sources.r2.kafka.topics &#x3D; topic_event</span><br><span class="line">    </span><br><span class="line">## channel1</span><br><span class="line">a1.channels.c1.type &#x3D; file</span><br><span class="line">##索引文件路径</span><br><span class="line">a1.channels.c1.checkpointDir&#x3D;&#x2F;root&#x2F;test&#x2F;flume&#x2F;checkpoint&#x2F;behavior1</span><br><span class="line">##持久化路径</span><br><span class="line">a1.channels.c1.dataDirs &#x3D; &#x2F;root&#x2F;test&#x2F;flume&#x2F;data&#x2F;behavior1&#x2F;</span><br><span class="line">a1.channels.c1.maxFileSize &#x3D; 2146435071</span><br><span class="line">a1.channels.c1.capacity &#x3D; 1000000</span><br><span class="line">a1.channels.c1.keep-alive &#x3D; 6</span><br><span class="line">## channel2</span><br><span class="line">a1.channels.c2.type &#x3D; file</span><br><span class="line">##索引文件路径</span><br><span class="line">a1.channels.c1.checkpointDir&#x3D;&#x2F;root&#x2F;test&#x2F;flume&#x2F;checkpoint&#x2F;behavior2</span><br><span class="line">##持久化路径</span><br><span class="line">a1.channels.c1.dataDirs &#x3D; &#x2F;root&#x2F;test&#x2F;flume&#x2F;data&#x2F;behavior2&#x2F;</span><br><span class="line">a1.channels.c2.maxFileSize &#x3D; 2146435071</span><br><span class="line">a1.channels.c2.capacity &#x3D; 1000000</span><br><span class="line">a1.channels.c2.keep-alive &#x3D; 6</span><br><span class="line">    </span><br><span class="line">## HDFS-sink1</span><br><span class="line">a1.sinks.k1.type &#x3D; hdfs</span><br><span class="line">a1.sinks.k1.hdfs.path&#x3D;&#x2F;origin_data&#x2F;gmall&#x2F;log&#x2F;topic_start&#x2F;%Y-%m-%d</span><br><span class="line">a1.sinks.k1.hdfs.filePrefix &#x3D; logstart-</span><br><span class="line">## HDFS-sink2       </span><br><span class="line">a1.sinks.k2.type &#x3D; hdfs</span><br><span class="line">a1.sinks.k2.hdfs.path &#x3D; &#x2F;origin_data&#x2F;gmall&#x2F;log&#x2F;topic_event&#x2F;%Y-%m-%d</span><br><span class="line">a1.sinks.k2.hdfs.filePrefix &#x3D; logevent-</span><br><span class="line">    </span><br><span class="line">## 不要产生大量小文件</span><br><span class="line">a1.sinks.k1.hdfs.rollInterval &#x3D; 10</span><br><span class="line">a1.sinks.k1.hdfs.rollSize &#x3D; 134217728</span><br><span class="line">a1.sinks.k1.hdfs.rollCount &#x3D; 0</span><br><span class="line">a1.sinks.k2.hdfs.rollInterval &#x3D; 50</span><br><span class="line">a1.sinks.k2.hdfs.rollSize &#x3D; 134217728</span><br><span class="line">a1.sinks.k2.hdfs.rollCount &#x3D; 0</span><br><span class="line">## 控制输出文件是原生文件。</span><br><span class="line">a1.sinks.k1.hdfs.fileType &#x3D; CompressedStream</span><br><span class="line">a1.sinks.k2.hdfs.fileType &#x3D; CompressedStream</span><br><span class="line">a1.sinks.k1.hdfs.codeC &#x3D; snappy</span><br><span class="line">a1.sinks.k2.hdfs.codeC &#x3D; snappy</span><br><span class="line">    </span><br><span class="line">## 组件拼装</span><br><span class="line">a1.sources.r1.channels &#x3D; c1</span><br><span class="line">a1.sinks.k1.channel&#x3D; c1</span><br><span class="line">a1.sources.r2.channels &#x3D; c2</span><br><span class="line">a1.sinks.k2.channel&#x3D; c2</span><br></pre></td></tr></table></figure><p><a name="blogTitle16"></a></p><h3 id="4-5-Flume启停脚本"><a href="#4-5-Flume启停脚本" class="headerlink" title="4.5 Flume启停脚本"></a>4.5 Flume启停脚本</h3><blockquote><p>在第三台服务上准备</p></blockquote></li><li><p>vim /root/test/kafka-hdfs-flume.sh</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line">#! &#x2F;bin&#x2F;bash</span><br><span class="line">case $1 in</span><br><span class="line">&quot;start&quot;)&#123;</span><br><span class="line">for i in cdh01.cm</span><br><span class="line">do</span><br><span class="line">echo &quot; --------启动 $i 消费 flume-------&quot;</span><br><span class="line">ssh $i &quot;nohup flume-ng agent --conf-file &#x2F;root&#x2F;test&#x2F;kafka-flume-hdfs.conf --name a1 -Dflume.root.logger&#x3D;INFO,LOGFILE &gt;&#x2F;root&#x2F;test&#x2F;kafka-flume-hdfs.log 2&gt;&amp;1 &amp;&quot;</span><br><span class="line">done</span><br><span class="line">&#125;;;</span><br><span class="line">&quot;stop&quot;)&#123;</span><br><span class="line">for i in cdh01.cm</span><br><span class="line">do</span><br><span class="line">echo &quot; --------停止 $i 消费 flume-------&quot;</span><br><span class="line">ssh $i &quot;ps -ef | grep kafka-flume-hdfs | grep -v grep |awk &#39;&#123;print \$2&#125;&#39; | xargs kill&quot;</span><br><span class="line">done</span><br><span class="line">&#125;;;</span><br><span class="line">esac</span><br></pre></td></tr></table></figure></li></ul><p><br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445281-63bb8866-411b-4829-b0b3-0d989bca1884.png#align=left&display=inline&height=571&margin=%5Bobject%20Object%5D&originHeight=571&originWidth=1159&size=0&status=done&style=none&width=1159" alt><br><a name="blogTitle17"></a></p><h2 id="5-业务数据"><a href="#5-业务数据" class="headerlink" title="5 业务数据"></a>5 业务数据</h2><blockquote><p>此模块后主要针对于企业报表决策，为数据分析提供数据支持，解决大数据量下，无法快速产出报表，及一些即席业务需求的快速展示提供数据支撑。划分企业离线与实时业务，用离线的方式直观的管理数据呈现，为实时方案奠定良好基础。</p></blockquote><p><a name="blogTitle18"></a></p><h3 id="5-1-电商业务流程"><a href="#5-1-电商业务流程" class="headerlink" title="5.1 电商业务流程"></a>5.1 电商业务流程</h3><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445499-c8b438ec-8428-4251-83c3-daecf0ea0f47.png#align=left&display=inline&height=647&margin=%5Bobject%20Object%5D&originHeight=647&originWidth=959&size=0&status=done&style=none&width=959" alt><br><a name="blogTitle19"></a></p><h3 id="5-2-SKU-SPU"><a href="#5-2-SKU-SPU" class="headerlink" title="5.2 SKU-SPU"></a>5.2 SKU-SPU</h3><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445306-c312505f-c425-4ba5-97ee-9970d00d11df.png#align=left&display=inline&height=603&margin=%5Bobject%20Object%5D&originHeight=603&originWidth=1244&size=0&status=done&style=none&width=1244" alt></p><ul><li>SKU（Stock Keeping Unit）：库存量基本单位，现在已经被引申为产品统一编号的简称， 每种产品均对应有唯一的 SKU 号。</li><li>SPU（Standard Product Unit）：是商品信息聚合的最小单位，是一组可复用、易检索的 标准化信息集合。</li><li>总结：黑鲨3 手机就是 SPU。一台铠甲灰、256G 内存的就是 SKU。<br><a name="blogTitle20"></a><h3 id="5-3-业务表结构"><a href="#5-3-业务表结构" class="headerlink" title="5.3 业务表结构"></a>5.3 业务表结构</h3><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445320-2cc623a5-0587-4abf-a3ae-daaaf038854d.png#align=left&display=inline&height=682&margin=%5Bobject%20Object%5D&originHeight=682&originWidth=1134&size=0&status=done&style=none&width=1134" alt><br><a name="2a0538a3"></a><h4 id="5-3-1-订单表（order-info）"><a href="#5-3-1-订单表（order-info）" class="headerlink" title="5.3.1 订单表（order_info）"></a>5.3.1 订单表（order_info）</h4><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445293-807044ab-66a1-4421-823b-b7945c79f048.png#align=left&display=inline&height=370&margin=%5Bobject%20Object%5D&originHeight=370&originWidth=820&size=0&status=done&style=none&width=820" alt><br><a name="e3c30ff7"></a><h4 id="5-3-2-订单详情表（order-detail）"><a href="#5-3-2-订单详情表（order-detail）" class="headerlink" title="5.3.2 订单详情表（order_detail）"></a>5.3.2 订单详情表（order_detail）</h4><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445421-3fefa623-d4e1-4458-9247-ceafcc11bbd9.png#align=left&display=inline&height=188&margin=%5Bobject%20Object%5D&originHeight=188&originWidth=820&size=0&status=done&style=none&width=820" alt><br><a name="ef4539a6"></a><h4 id="5-3-3-SKU-商品表（sku-info）"><a href="#5-3-3-SKU-商品表（sku-info）" class="headerlink" title="5.3.3 SKU 商品表（sku_info）"></a>5.3.3 SKU 商品表（sku_info）</h4><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445457-375551a4-5ba5-4fe2-8594-36a86445a0e9.png#align=left&display=inline&height=224&margin=%5Bobject%20Object%5D&originHeight=224&originWidth=820&size=0&status=done&style=none&width=820" alt><br><a name="9ecdba66"></a><h4 id="5-3-4-用户表（user-info）"><a href="#5-3-4-用户表（user-info）" class="headerlink" title="5.3.4 用户表（user_info）"></a>5.3.4 用户表（user_info）</h4><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445409-4fcce741-b0a9-40fe-ae2a-e0669c51338b.png#align=left&display=inline&height=261&margin=%5Bobject%20Object%5D&originHeight=261&originWidth=820&size=0&status=done&style=none&width=820" alt><br><a name="4097e291"></a><h4 id="5-3-5-商品一级分类表（base-category1）"><a href="#5-3-5-商品一级分类表（base-category1）" class="headerlink" title="5.3.5 商品一级分类表（base_category1）"></a>5.3.5 商品一级分类表（base_category1）</h4><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445599-b968e841-b432-4d76-87e3-e4e4990466c9.png#align=left&display=inline&height=93&margin=%5Bobject%20Object%5D&originHeight=93&originWidth=820&size=0&status=done&style=none&width=820" alt><br><a name="edd3477b"></a><h4 id="5-3-6-商品二级分类表（base-category2）"><a href="#5-3-6-商品二级分类表（base-category2）" class="headerlink" title="5.3.6 商品二级分类表（base_category2）"></a>5.3.6 商品二级分类表（base_category2）</h4><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445399-404c5075-6615-4886-8101-ae6a9a2ea836.png#align=left&display=inline&height=99&margin=%5Bobject%20Object%5D&originHeight=99&originWidth=820&size=0&status=done&style=none&width=820" alt><br><a name="761365da"></a><h4 id="5-3-7-商品三级分类表（base-category3）"><a href="#5-3-7-商品三级分类表（base-category3）" class="headerlink" title="5.3.7 商品三级分类表（base_category3）"></a>5.3.7 商品三级分类表（base_category3）</h4><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445762-7d1a4322-7ef5-4fe2-bb8b-ef97c0bcbb9a.png#align=left&display=inline&height=103&margin=%5Bobject%20Object%5D&originHeight=103&originWidth=820&size=0&status=done&style=none&width=820" alt><br><a name="d73753a2"></a><h4 id="5-3-8-支付流水表（payment-info）"><a href="#5-3-8-支付流水表（payment-info）" class="headerlink" title="5.3.8 支付流水表（payment_info）"></a>5.3.8 支付流水表（payment_info）</h4><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445446820-0341ff71-434e-4a82-8036-eba14a8476b9.png#align=left&display=inline&height=199&margin=%5Bobject%20Object%5D&originHeight=199&originWidth=820&size=0&status=done&style=none&width=820" alt><br><a name="8e375166"></a><h4 id="5-3-9-省份表（base-province）"><a href="#5-3-9-省份表（base-province）" class="headerlink" title="5.3.9 省份表（base_province）"></a>5.3.9 省份表（base_province）</h4><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445408-12faad35-63c2-4718-ac13-83b26354b3c3.png#align=left&display=inline&height=119&margin=%5Bobject%20Object%5D&originHeight=119&originWidth=820&size=0&status=done&style=none&width=820" alt><br><a name="d0920d19"></a><h4 id="5-3-10-地区表（base-region）"><a href="#5-3-10-地区表（base-region）" class="headerlink" title="5.3.10 地区表（base_region）"></a>5.3.10 地区表（base_region）</h4><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445709-051d048d-66ef-40a7-981a-8538094858a6.png#align=left&display=inline&height=72&margin=%5Bobject%20Object%5D&originHeight=72&originWidth=820&size=0&status=done&style=none&width=820" alt><br><a name="3df9abe4"></a><h4 id="5-3-11-品牌表（base-trademark）"><a href="#5-3-11-品牌表（base-trademark）" class="headerlink" title="5.3.11 品牌表（base_trademark）"></a>5.3.11 品牌表（base_trademark）</h4><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445417-956df41f-c8d8-43e9-a9ef-478c4e16fa95.png#align=left&display=inline&height=106&margin=%5Bobject%20Object%5D&originHeight=106&originWidth=1277&size=0&status=done&style=none&width=1277" alt><br><a name="e1eb5124"></a><h4 id="5-3-12-订单状态表（order-status-log）"><a href="#5-3-12-订单状态表（order-status-log）" class="headerlink" title="5.3.12 订单状态表（order_status_log）"></a>5.3.12 订单状态表（order_status_log）</h4><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445403-6ebe9cc2-6007-4e39-9647-1d63cb9886f4.png#align=left&display=inline&height=118&margin=%5Bobject%20Object%5D&originHeight=118&originWidth=820&size=0&status=done&style=none&width=820" alt><br><a name="dbb7a316"></a><h4 id="5-3-13-SPU-商品表（spu-info）"><a href="#5-3-13-SPU-商品表（spu-info）" class="headerlink" title="5.3.13 SPU 商品表（spu_info）"></a>5.3.13 SPU 商品表（spu_info）</h4><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445616-7c564b8c-e4be-4942-a245-5a986b440689.png#align=left&display=inline&height=136&margin=%5Bobject%20Object%5D&originHeight=136&originWidth=820&size=0&status=done&style=none&width=820" alt><br><a name="afc9aaeb"></a><h4 id="5-3-14-商品评论表（comment-info）"><a href="#5-3-14-商品评论表（comment-info）" class="headerlink" title="5.3.14 商品评论表（comment_info）"></a>5.3.14 商品评论表（comment_info）</h4><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445432-2a0d5fa0-0fcf-45b9-9d62-a3c97be4ced3.png#align=left&display=inline&height=260&margin=%5Bobject%20Object%5D&originHeight=260&originWidth=1132&size=0&status=done&style=none&width=1132" alt><br><a name="3cd84c4c"></a><h4 id="5-3-15-退单表（order-refund-info）"><a href="#5-3-15-退单表（order-refund-info）" class="headerlink" title="5.3.15 退单表（order_refund_info）"></a>5.3.15 退单表（order_refund_info）</h4><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445415-560c225d-39ad-4e97-ae5e-feded91e61bc.png#align=left&display=inline&height=211&margin=%5Bobject%20Object%5D&originHeight=211&originWidth=820&size=0&status=done&style=none&width=820" alt><br><a name="86808c1f"></a><h4 id="5-3-16-加入购物车表（cart-info）"><a href="#5-3-16-加入购物车表（cart-info）" class="headerlink" title="5.3.16 加入购物车表（cart_info）"></a>5.3.16 加入购物车表（cart_info）</h4><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445420-ec008b1a-ac25-4d67-8f30-a4f22f8bf986.png#align=left&display=inline&height=229&margin=%5Bobject%20Object%5D&originHeight=229&originWidth=820&size=0&status=done&style=none&width=820" alt><br><a name="e396f00d"></a><h4 id="5-3-17-商品收藏表（favor-info）"><a href="#5-3-17-商品收藏表（favor-info）" class="headerlink" title="5.3.17 商品收藏表（favor_info）"></a>5.3.17 商品收藏表（favor_info）</h4><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445413-5bc2fefc-43ea-4198-962a-68e04c3693b5.png#align=left&display=inline&height=167&margin=%5Bobject%20Object%5D&originHeight=167&originWidth=820&size=0&status=done&style=none&width=820" alt><br><a name="599c2d57"></a><h4 id="5-3-18-优惠券领用表（coupon-use）"><a href="#5-3-18-优惠券领用表（coupon-use）" class="headerlink" title="5.3.18 优惠券领用表（coupon_use）"></a>5.3.18 优惠券领用表（coupon_use）</h4><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445440-5134d8aa-2085-457c-bb47-4ba611528bc1.png#align=left&display=inline&height=193&margin=%5Bobject%20Object%5D&originHeight=193&originWidth=820&size=0&status=done&style=none&width=820" alt><br><a name="597f8aa7"></a><h4 id="5-3-19-优惠券表（coupon-info）"><a href="#5-3-19-优惠券表（coupon-info）" class="headerlink" title="5.3.19 优惠券表（coupon_info）"></a>5.3.19 优惠券表（coupon_info）</h4><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445469-06ad320b-4efb-4e48-b359-66b5a0180987.png#align=left&display=inline&height=306&margin=%5Bobject%20Object%5D&originHeight=306&originWidth=820&size=0&status=done&style=none&width=820" alt><br><a name="14b73501"></a><h4 id="5-3-20-活动表（activity-info）"><a href="#5-3-20-活动表（activity-info）" class="headerlink" title="5.3.20 活动表（activity_info）"></a>5.3.20 活动表（activity_info）</h4><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445480-d02fd607-dfe9-47b8-992f-8f675460e400.png#align=left&display=inline&height=172&margin=%5Bobject%20Object%5D&originHeight=172&originWidth=820&size=0&status=done&style=none&width=820" alt><br><a name="23b08d0d"></a><h4 id="5-3-21-活动订单关联表（activity-order）"><a href="#5-3-21-活动订单关联表（activity-order）" class="headerlink" title="5.3.21 活动订单关联表（activity_order）"></a>5.3.21 活动订单关联表（activity_order）</h4></li></ul><p><a name="f2c83679"></a></p><h4 id="5-3-22-优惠规则表（activity-rule）"><a href="#5-3-22-优惠规则表（activity-rule）" class="headerlink" title="5.3.22 优惠规则表（activity_rule）"></a>5.3.22 优惠规则表（activity_rule）</h4><p><a name="c05cf4cd"></a></p><h4 id="5-3-23-编码字典表（base-dic）"><a href="#5-3-23-编码字典表（base-dic）" class="headerlink" title="5.3.23 编码字典表（base_dic）"></a>5.3.23 编码字典表（base_dic）</h4><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445580-ef8aff78-4e6e-4cfa-8a34-5cb0929bbf8f.png#align=left&display=inline&height=133&margin=%5Bobject%20Object%5D&originHeight=133&originWidth=820&size=0&status=done&style=none&width=820" alt><br><a name="41ae15c6"></a></p><h4 id="5-3-24-活动参与商品表（activity-sku）"><a href="#5-3-24-活动参与商品表（activity-sku）" class="headerlink" title="5.3.24 活动参与商品表（activity_sku）"></a>5.3.24 活动参与商品表（activity_sku）</h4><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445455-d299fd9e-7ed4-4419-a4eb-41e0d3af4c25.png#align=left&display=inline&height=181&margin=%5Bobject%20Object%5D&originHeight=181&originWidth=1279&size=0&status=done&style=none&width=1279" alt><br><a name="blogTitle21"></a></p><h3 id="5-4-时间表结构"><a href="#5-4-时间表结构" class="headerlink" title="5.4 时间表结构"></a>5.4 时间表结构</h3><p><a name="f09b2786"></a></p><h4 id="5-4-1-时间表（date-info）"><a href="#5-4-1-时间表（date-info）" class="headerlink" title="5.4.1 时间表（date_info）"></a>5.4.1 时间表（date_info）</h4><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445509-9e9b8580-ed76-4d31-9524-0d9d35af8e7e.png#align=left&display=inline&height=194&margin=%5Bobject%20Object%5D&originHeight=194&originWidth=820&size=0&status=done&style=none&width=820" alt><br><a name="f703d98c"></a></p><h4 id="5-4-2-假期表（holiday-info）"><a href="#5-4-2-假期表（holiday-info）" class="headerlink" title="5.4.2 假期表（holiday_info）"></a>5.4.2 假期表（holiday_info）</h4><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445498-cace1b91-fd1c-495e-ae9e-5b20b950f181.png#align=left&display=inline&height=80&margin=%5Bobject%20Object%5D&originHeight=80&originWidth=820&size=0&status=done&style=none&width=820" alt><br><a name="421431fc"></a></p><h4 id="5-4-3-假期年表（holiday-year）"><a href="#5-4-3-假期年表（holiday-year）" class="headerlink" title="5.4.3 假期年表（holiday_year）"></a>5.4.3 假期年表（holiday_year）</h4><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445585-dfcd1dca-4af1-4822-98c0-729ff17ad0cd.png#align=left&display=inline&height=105&margin=%5Bobject%20Object%5D&originHeight=105&originWidth=820&size=0&status=done&style=none&width=820" alt><br><a name="blogTitle22"></a></p><h2 id="6-同步策略及数仓分层"><a href="#6-同步策略及数仓分层" class="headerlink" title="6 同步策略及数仓分层"></a>6 同步策略及数仓分层</h2><blockquote><p>数据同步策略的类型包括：全量表、增量表、新增及变化表</p></blockquote><ul><li>全量表：每天一个分区，存储完整的数据。<br></li><li>增量表：每天新增数据放在一个分区，存储新增加的数据。<br></li><li>新增及变化表：每天新增和变化的数据放在一个分区，存储新增加的数据和变化的数据。<br></li><li>特殊表：没有分区，只需要存储一次。<br><br><a name="blogTitle23"></a><h3 id="6-1-全量策略"><a href="#6-1-全量策略" class="headerlink" title="6.1 全量策略"></a>6.1 全量策略</h3><blockquote><p>每日全量，每天存储一份完整数据，作为一个分区。<br>适合场景：表数据量不大，且有新增或修改业务的场景<br>例如：品牌表、编码表、商品分类表、优惠规则表、活动表、商品表、加购表、收藏表、SKU/SPU表</p></blockquote></li></ul><p><a name="blogTitle24"></a></p><h3 id="6-2-增量策略"><a href="#6-2-增量策略" class="headerlink" title="6.2 增量策略"></a>6.2 增量策略</h3><blockquote><p>每日增量，每天储存一份增量数据，作为一个分区<br>适合场景：表数据量大，且只会有新增数据的场景。<br>例如：退单表、订单状态表、支付流水表、订单详情表、活动与订单关联表、商品评论表</p></blockquote><p><a name="blogTitle25"></a></p><h3 id="6-3-新增及变化策略"><a href="#6-3-新增及变化策略" class="headerlink" title="6.3 新增及变化策略"></a>6.3 新增及变化策略</h3><blockquote><p>每日新增及变化，储存创建时间和操作时间都是今天的数据，作为一个分区<br>适合场景：表数据量大，既会有新增，又会有修改。<br>例如：用户表、订单表、优惠卷领用表。</p></blockquote><p><a name="blogTitle26"></a></p><h3 id="6-4-特殊策略"><a href="#6-4-特殊策略" class="headerlink" title="6.4 特殊策略"></a>6.4 特殊策略</h3><blockquote><p>某些特殊的维度表，可不必遵循上述同步策略，在数仓中只做一次同步，数据不变化不更新<br>适合场景：表数据几乎不会变化<br>1.客观世界维度：没变化的客观世界的维度（比如性别，地区，民族，政治成分，鞋子尺码）可以只存一 份固定值<br>2.日期维度：日期维度可以一次性导入一年或若干年的数据。<br>3.地区维度：省份表、地区表</p></blockquote><p><a name="blogTitle27"></a></p><h3 id="6-5-分析业务表同步策略"><a href="#6-5-分析业务表同步策略" class="headerlink" title="6.5 分析业务表同步策略"></a>6.5 分析业务表同步策略</h3><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445558-f3526a7c-422a-4254-b832-a0edb969ecd0.png#align=left&display=inline&height=577&margin=%5Bobject%20Object%5D&originHeight=577&originWidth=1161&size=0&status=done&style=none&width=1161" alt></p><blockquote><p>考虑到特殊表可能会缓慢变化，比如打仗占地盘，地区表可能就会发生变化，故也选择分区全量同步策略。</p></blockquote><p><a name="blogTitle28"></a></p><h3 id="6-6-数仓分层"><a href="#6-6-数仓分层" class="headerlink" title="6.6 数仓分层"></a>6.6 数仓分层</h3><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445503-1f00df37-1048-4dab-bd2a-b93874b97ed6.png#align=left&display=inline&height=563&margin=%5Bobject%20Object%5D&originHeight=563&originWidth=1141&size=0&status=done&style=none&width=1141" alt></p><ul><li><p>为什么分层：</p><ul><li>简单化：把复杂的任务分解为多层来完成，每层处理各自的任务，方便定位问题。</li><li>减少重复开发：规范数据分层，通过中间层数据，能够极大的减少重复计算，增加结果复用性。</li><li>隔离数据：不论是数据异常还是数据敏感性，使真实数据和统计数据解耦。</li><li>一般在DWD层进行维度建模</li></ul></li><li><p>ODS层：原始数据层，存放原始数据</p></li><li><p>DWD层：对ODS层数据进行清洗（去空、脏数据，转换类型等），维度退化，脱敏(保护隐私)</p></li><li><p>DWS层：以DWD为基础，按天进行汇总</p></li><li><p>DWT层：以DWS为基础，按主题进行汇总</p></li><li><p>ADS层：为各种数据分析报表提供数据<br><a name="blogTitle29"></a></p><h2 id="7-Sqoop同步数据"><a href="#7-Sqoop同步数据" class="headerlink" title="7 Sqoop同步数据"></a>7 Sqoop同步数据</h2><blockquote><p>Sqoop注意点：<br>Hive 中的 Null 在底层是以“\N”来存储，而 MySQL 中的 Null 在底层就是 Null，为了 保证数据两端的一致性。</p><ul><li>在导出数据时采用 –input-null-string 和 –input-null-non-string</li><li>导入数据时采用 –null-string 和 –null-non-string</li></ul></blockquote><p>本例思路为：sqoop抽取mysql数据上传至Hdfs上，存储为parquet文件，在建立hive-ods表，使用对应数据。</p><blockquote><p>使用DolphinScheduler调度执行脚本。</p></blockquote></li><li><p>Sqoop采集Mysql和Hive数据格式</p><table><thead><tr><th>mysql字段类型</th><th>hive:ods字段类型</th><th>hive:dwd-ads字段类型</th></tr></thead><tbody><tr><td>tinyint</td><td>tinyint</td><td>tinyint</td></tr><tr><td>int</td><td>int</td><td>int</td></tr><tr><td>bigint</td><td>bigint</td><td>bigint</td></tr><tr><td>varchar</td><td>string</td><td>string</td></tr><tr><td>datetime</td><td>bigint</td><td>string</td></tr><tr><td>bit</td><td>boolean</td><td>int</td></tr><tr><td>double</td><td>double</td><td>double</td></tr><tr><td>decimal</td><td>decimal</td><td>decimal</td></tr></tbody></table></li></ul><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445576-95ae80a2-e157-4ba2-ac2f-39c0057bfc4d.png#align=left&display=inline&height=820&margin=%5Bobject%20Object%5D&originHeight=820&originWidth=1920&size=0&status=done&style=none&width=1920" alt><br><a name="blogTitle30"></a></p><h2 id="8-ods层构建"><a href="#8-ods层构建" class="headerlink" title="8 ods层构建"></a>8 ods层构建</h2><p><a name="blogTitle31"></a></p><h3 id="8-1-ods建表"><a href="#8-1-ods建表" class="headerlink" title="8.1 ods建表"></a>8.1 ods建表</h3><blockquote><p>hive创建ods数据库，使用DolphinScheduler创建数据源，在创建DAG时需要选择hive库。<br>顺便将dwd，dws，dwt，ads一起创建了</p></blockquote><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445577-7f5949c5-01ad-4097-b19b-0ebe0cfd3546.png#align=left&display=inline&height=625&margin=%5Bobject%20Object%5D&originHeight=625&originWidth=847&size=0&status=done&style=none&width=847" alt><br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445447507-0aa01c2c-12fc-4d51-ba3e-a055609bcb47.png#align=left&display=inline&height=682&margin=%5Bobject%20Object%5D&originHeight=682&originWidth=1134&size=0&status=done&style=none&width=1134" alt><br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445580-441c434e-54a5-49d6-973a-ef7d97c1b4ac.png#align=left&display=inline&height=577&margin=%5Bobject%20Object%5D&originHeight=577&originWidth=1161&size=0&status=done&style=none&width=1161" alt><br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445560-1a8ada2f-b802-4db9-b161-788f80c89770.png#align=left&display=inline&height=683&margin=%5Bobject%20Object%5D&originHeight=683&originWidth=1168&size=0&status=done&style=none&width=1168" alt></p><ol><li><p>base_dic</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ods.mall__base_dic</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ods.mall__base_dic&#96;(</span><br><span class="line">  &#96;dic_code&#96; string COMMENT &#39;编号&#39;,</span><br><span class="line">  &#96;dic_name&#96; string COMMENT &#39;编码名称&#39;,</span><br><span class="line">  &#96;parent_code&#96;  string COMMENT &#39;父编号&#39;,</span><br><span class="line">  &#96;create_time&#96; bigint COMMENT &#39;创建日期&#39;,</span><br><span class="line">  &#96;operate_time&#96; bigint COMMENT &#39;修改日期&#39;</span><br><span class="line">  ) COMMENT &#39;编码字典表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ods&#x2F;mall&#x2F;base_dic&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>base_trademark</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ods.mall__base_trademark</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ods.mall__base_trademark&#96;(</span><br><span class="line">  &#96;tm_id&#96; string COMMENT &#39;品牌id&#39;,</span><br><span class="line">  &#96;tm_name&#96; string COMMENT &#39;品牌名称&#39;</span><br><span class="line">  ) COMMENT &#39;品牌表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ods&#x2F;mall&#x2F;base_trademark&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>base_category3</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ods.mall__base_category3</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ods.mall__base_category3&#96;(</span><br><span class="line">  &#96;id&#96; bigint COMMENT &#39;编号&#39;,</span><br><span class="line">  &#96;name&#96; string COMMENT &#39;三级分类名称&#39;,</span><br><span class="line">  &#96;category2_id&#96; bigint COMMENT &#39;二级分类编号&#39;</span><br><span class="line">  ) COMMENT &#39;三级分类表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ods&#x2F;mall&#x2F;base_category3&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>base_category2</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ods.mall__base_category2</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ods.mall__base_category2&#96;(</span><br><span class="line">  &#96;id&#96; bigint COMMENT &#39;编号&#39;,</span><br><span class="line">  &#96;name&#96; string COMMENT &#39;二级分类名称&#39;,</span><br><span class="line">  &#96;category1_id&#96; bigint COMMENT &#39;一级分类编号&#39;</span><br><span class="line">  ) COMMENT &#39;二级分类表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ods&#x2F;mall&#x2F;base_category2&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>base_category1</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ods.mall__base_category1</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ods.mall__base_category1&#96;(</span><br><span class="line">  &#96;id&#96; bigint COMMENT &#39;编号&#39;,</span><br><span class="line">  &#96;name&#96; string COMMENT &#39;分类名称&#39;</span><br><span class="line">  ) COMMENT &#39;一级分类表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ods&#x2F;mall&#x2F;base_category1&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>activity_rule</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ods.mall__activity_rule</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ods.mall__activity_rule&#96;(</span><br><span class="line">  &#96;id&#96; int COMMENT &#39;编号&#39;,</span><br><span class="line">  &#96;activity_id&#96; int COMMENT &#39;类型&#39;,</span><br><span class="line">  &#96;condition_amount&#96; decimal(16,2) COMMENT &#39;满减金额&#39;,</span><br><span class="line">  &#96;condition_num&#96; bigint COMMENT &#39;满减件数&#39;,</span><br><span class="line">  &#96;benefit_amount&#96; decimal(16,2) COMMENT &#39;优惠金额&#39;,</span><br><span class="line">  &#96;benefit_discount&#96; bigint COMMENT &#39;优惠折扣&#39;,</span><br><span class="line">  &#96;benefit_level&#96; bigint COMMENT &#39;优惠级别&#39;</span><br><span class="line">  ) COMMENT &#39;优惠规则&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ods&#x2F;mall&#x2F;activity_rule&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>activity_info</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ods.mall__activity_info</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ods.mall__activity_info&#96;(</span><br><span class="line">  &#96;id&#96; bigint COMMENT &#39;活动id&#39;,</span><br><span class="line">  &#96;activity_name&#96; string COMMENT &#39;活动名称&#39;,</span><br><span class="line">  &#96;activity_type&#96; string COMMENT &#39;活动类型&#39;,</span><br><span class="line">  &#96;start_time&#96; bigint COMMENT &#39;开始时间&#39;,</span><br><span class="line">  &#96;end_time&#96; bigint COMMENT &#39;结束时间&#39;,</span><br><span class="line">  &#96;create_time&#96; bigint COMMENT &#39;创建时间&#39;</span><br><span class="line">  ) COMMENT &#39;活动表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ods&#x2F;mall&#x2F;activity_info&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>activity_sku</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ods.mall__activity_sku</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ods.mall__activity_sku&#96;(</span><br><span class="line">  &#96;id&#96; bigint COMMENT &#39;编号&#39;,</span><br><span class="line">  &#96;activity_id&#96; bigint COMMENT &#39;活动id&#39;,</span><br><span class="line">  &#96;sku_id&#96; bigint COMMENT &#39;sku_id&#39;,</span><br><span class="line">  &#96;create_time&#96; bigint COMMENT &#39;创建时间&#39;</span><br><span class="line">  ) COMMENT &#39;活动参与商品&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ods&#x2F;mall&#x2F;activity_sku&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>cart_info</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ods.mall__cart_info</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ods.mall__cart_info&#96;(</span><br><span class="line">  &#96;id&#96; bigint COMMENT &#39;编号&#39;,</span><br><span class="line">  &#96;user_id&#96; bigint COMMENT &#39;用户id&#39;,</span><br><span class="line">  &#96;sku_id&#96; bigint COMMENT &#39;sku_id&#39;,</span><br><span class="line">  &#96;cart_price&#96; decimal(10,2) COMMENT &#39;放入购物车时价格&#39;,</span><br><span class="line">  &#96;sku_num&#96; bigint COMMENT &#39;数量&#39;,</span><br><span class="line">  &#96;sku_name&#96; string COMMENT &#39;sku名称&#39;,</span><br><span class="line">  &#96;create_time&#96; bigint COMMENT &#39;创建时间&#39;,</span><br><span class="line">  &#96;operate_time&#96; bigint COMMENT &#39;修改时间&#39;,</span><br><span class="line">  &#96;is_ordered&#96; bigint COMMENT &#39;是否已经下单&#39;,</span><br><span class="line">  &#96;order_time&#96; bigint COMMENT &#39;下单时间&#39;</span><br><span class="line">  ) COMMENT &#39;购物车表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ods&#x2F;mall&#x2F;cart_info&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>favor_info</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ods.mall__favor_info</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ods.mall__favor_info&#96;(</span><br><span class="line">  &#96;id&#96; bigint COMMENT &#39;编号&#39;,</span><br><span class="line">  &#96;user_id&#96; bigint COMMENT &#39;用户id&#39;,</span><br><span class="line">  &#96;sku_id&#96; bigint COMMENT &#39;sku_id&#39;,</span><br><span class="line">  &#96;spu_id&#96; bigint COMMENT &#39;商品id&#39;,</span><br><span class="line">  &#96;is_cancel&#96; string COMMENT &#39;是否已取消 0 正常 1 已取消&#39;,</span><br><span class="line">  &#96;create_time&#96; bigint COMMENT &#39;创建时间&#39;,</span><br><span class="line">  &#96;cancel_time&#96; bigint COMMENT &#39;修改时间&#39;</span><br><span class="line">  ) COMMENT &#39;商品收藏表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ods&#x2F;mall&#x2F;favor_info&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>coupon_info</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ods.mall__coupon_info</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ods.mall__coupon_info&#96;(</span><br><span class="line">  &#96;id&#96; bigint COMMENT &#39;购物券编号&#39;,</span><br><span class="line">  &#96;coupon_name&#96; string COMMENT &#39;购物券名称&#39;,</span><br><span class="line">  &#96;coupon_type&#96; string COMMENT &#39;购物券类型 1 现金券 2 折扣券 3 满减券 4 满件打折券&#39;,</span><br><span class="line">  &#96;condition_amount&#96; decimal(10,2) COMMENT &#39;满额数&#39;,</span><br><span class="line">  &#96;condition_num&#96; bigint COMMENT &#39;满件数&#39;,</span><br><span class="line">  &#96;activity_id&#96; bigint COMMENT &#39;活动编号&#39;,</span><br><span class="line">  &#96;benefit_amount&#96; decimal(16,2) COMMENT &#39;减金额&#39;,</span><br><span class="line">  &#96;benefit_discount&#96; bigint COMMENT &#39;折扣&#39;,</span><br><span class="line">  &#96;create_time&#96; bigint COMMENT &#39;创建时间&#39;,</span><br><span class="line">  &#96;range_type&#96; string COMMENT &#39;范围类型 1、商品 2、品类 3、品牌&#39;,</span><br><span class="line">  &#96;spu_id&#96; bigint COMMENT &#39;商品id&#39;,</span><br><span class="line">  &#96;tm_id&#96; bigint COMMENT &#39;品牌id&#39;,</span><br><span class="line">  &#96;category3_id&#96; bigint COMMENT &#39;品类id&#39;,</span><br><span class="line">  &#96;limit_num&#96; int COMMENT &#39;最多领用次数&#39;,</span><br><span class="line">  &#96;operate_time&#96; bigint COMMENT &#39;修改时间&#39;,</span><br><span class="line">  &#96;expire_time&#96; bigint COMMENT &#39;过期时间&#39;</span><br><span class="line">  ) COMMENT &#39;优惠券表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ods&#x2F;mall&#x2F;coupon_info&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>sku_info</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ods.mall__sku_info</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ods.mall__sku_info&#96;(</span><br><span class="line">  &#96;id&#96; bigint COMMENT &#39;skuid&#39;,</span><br><span class="line">  &#96;spu_id&#96; bigint COMMENT &#39;spuid&#39;,</span><br><span class="line">  &#96;price&#96; decimal(10,0) COMMENT &#39;价格&#39;,</span><br><span class="line">  &#96;sku_name&#96; string COMMENT &#39;sku名称&#39;,</span><br><span class="line">  &#96;sku_desc&#96; string COMMENT &#39;商品规格描述&#39;,</span><br><span class="line">  &#96;weight&#96; decimal(10,2) COMMENT &#39;重量&#39;,</span><br><span class="line">  &#96;tm_id&#96; bigint COMMENT &#39;品牌&#39;,</span><br><span class="line">  &#96;category3_id&#96; bigint COMMENT &#39;三级分类id&#39;,</span><br><span class="line">  &#96;create_time&#96; bigint COMMENT &#39;创建时间&#39;</span><br><span class="line">  ) COMMENT &#39;库存单元表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ods&#x2F;mall&#x2F;sku_info&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>spu_info</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ods.mall__spu_info</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ods.mall__spu_info&#96;(</span><br><span class="line">  &#96;id&#96; bigint COMMENT &#39;商品id&#39;,</span><br><span class="line">  &#96;spu_name&#96; string COMMENT &#39;商品名称&#39;,</span><br><span class="line">  &#96;category3_id&#96; bigint COMMENT &#39;三级分类id&#39;,</span><br><span class="line">  &#96;tm_id&#96; bigint COMMENT &#39;品牌id&#39;</span><br><span class="line">  ) COMMENT &#39;商品表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ods&#x2F;mall&#x2F;spu_info&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>base_province</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ods.mall__base_province</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ods.mall__base_province&#96;(</span><br><span class="line">  &#96;id&#96; bigint COMMENT &#39;id&#39;,</span><br><span class="line">  &#96;name&#96; string COMMENT &#39;省名称&#39;,</span><br><span class="line">  &#96;region_id&#96; string COMMENT &#39;大区id&#39;,</span><br><span class="line">  &#96;area_code&#96; string COMMENT &#39;行政区位码&#39;,</span><br><span class="line">  &#96;iso_code&#96; string COMMENT &#39;国际编码&#39;</span><br><span class="line">  ) COMMENT &#39;省份表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ods&#x2F;mall&#x2F;base_province&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>base_region</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ods.mall__base_region</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ods.mall__base_region&#96;(</span><br><span class="line">  &#96;id&#96; string COMMENT &#39;大区id&#39;,</span><br><span class="line">  &#96;region_name&#96; string COMMENT &#39;大区名称&#39;</span><br><span class="line">  ) COMMENT &#39;地区表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ods&#x2F;mall&#x2F;base_region&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>refund_info</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ods.mall__order_refund_info</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ods.mall__order_refund_info&#96;(</span><br><span class="line">  &#96;id&#96; bigint COMMENT &#39;编号&#39;,</span><br><span class="line">  &#96;user_id&#96; bigint COMMENT &#39;用户id&#39;,</span><br><span class="line">  &#96;order_id&#96; bigint COMMENT &#39;订单编号&#39;,</span><br><span class="line">  &#96;sku_id&#96; bigint COMMENT &#39;skuid&#39;,</span><br><span class="line">  &#96;refund_type&#96; string COMMENT &#39;退款类型&#39;,</span><br><span class="line">  &#96;refund_num&#96; bigint COMMENT &#39;退货件数&#39;,</span><br><span class="line">  &#96;refund_amount&#96; decimal(16,2) COMMENT &#39;退款金额&#39;,</span><br><span class="line">  &#96;refund_reason_type&#96; string COMMENT &#39;原因类型&#39;,</span><br><span class="line">  &#96;create_time&#96; bigint COMMENT &#39;创建时间&#39;</span><br><span class="line">  ) COMMENT &#39;退单表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ods&#x2F;mall&#x2F;order_refund_info&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>order_status_log</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ods.mall__order_status_log</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ods.mall__order_status_log&#96;(</span><br><span class="line">  &#96;id&#96; bigint COMMENT &#39;编号&#39;,</span><br><span class="line">  &#96;order_id&#96; bigint COMMENT &#39;订单编号&#39;,</span><br><span class="line">  &#96;order_status&#96; string COMMENT &#39;订单状态&#39;,</span><br><span class="line">  &#96;operate_time&#96; bigint COMMENT &#39;操作时间&#39;</span><br><span class="line">  ) COMMENT &#39;订单状态表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ods&#x2F;mall&#x2F;order_status_log&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>payment_info</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ods.mall__payment_info</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ods.mall__payment_info&#96;(</span><br><span class="line">  &#96;id&#96; bigint COMMENT &#39;编号&#39;,</span><br><span class="line">  &#96;out_trade_no&#96; string COMMENT &#39;对外业务编号&#39;,</span><br><span class="line">  &#96;order_id&#96; bigint COMMENT &#39;订单编号&#39;,</span><br><span class="line">  &#96;user_id&#96; bigint COMMENT &#39;用户编号&#39;,</span><br><span class="line">  &#96;alipay_trade_no&#96; string COMMENT &#39;支付宝交易流水编号&#39;,</span><br><span class="line">  &#96;total_amount&#96; decimal(16,2) COMMENT &#39;支付金额&#39;,</span><br><span class="line">  &#96;subject&#96; string COMMENT &#39;交易内容&#39;,</span><br><span class="line">  &#96;payment_type&#96; string COMMENT &#39;支付方式&#39;,</span><br><span class="line">  &#96;payment_time&#96; bigint COMMENT &#39;支付时间&#39;</span><br><span class="line">  ) COMMENT &#39;支付流水表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ods&#x2F;mall&#x2F;payment_info&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>order_detail</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ods.mall__order_detail</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ods.mall__order_detail&#96;(</span><br><span class="line">  &#96;id&#96; bigint COMMENT &#39;编号&#39;,</span><br><span class="line">  &#96;order_id&#96; bigint COMMENT &#39;订单编号&#39;,</span><br><span class="line">  &#96;user_id&#96; bigint COMMENT &#39;用户id&#39;,</span><br><span class="line">  &#96;sku_id&#96; bigint COMMENT &#39;sku_id&#39;,</span><br><span class="line">  &#96;sku_name&#96; string COMMENT &#39;sku名称&#39;,</span><br><span class="line">  &#96;order_price&#96; decimal(10,2) COMMENT &#39;购买价格(下单时sku价格）&#39;,</span><br><span class="line">  &#96;sku_num&#96; string COMMENT &#39;购买个数&#39;,</span><br><span class="line">  &#96;create_time&#96; bigint COMMENT &#39;创建时间&#39;</span><br><span class="line">  ) COMMENT &#39;订单明细表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ods&#x2F;mall&#x2F;order_detail&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>activity_order</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ods.mall__activity_order</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ods.mall__activity_order&#96;(</span><br><span class="line">  &#96;id&#96; bigint COMMENT &#39;编号&#39;,</span><br><span class="line">  &#96;activity_id&#96; bigint COMMENT &#39;活动id&#39;,</span><br><span class="line">  &#96;order_id&#96; bigint COMMENT &#39;订单编号&#39;,</span><br><span class="line">  &#96;create_time&#96; bigint COMMENT &#39;发生日期&#39;</span><br><span class="line">  ) COMMENT &#39;活动与订单关联表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ods&#x2F;mall&#x2F;activity_order&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>comment_info</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ods.mall__comment_info</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ods.mall__comment_info&#96;(</span><br><span class="line">  &#96;id&#96; bigint COMMENT &#39;编号&#39;,</span><br><span class="line">  &#96;user_id&#96; bigint COMMENT &#39;用户名称&#39;,</span><br><span class="line">  &#96;sku_id&#96; bigint COMMENT &#39;skuid&#39;,</span><br><span class="line">  &#96;spu_id&#96; bigint COMMENT &#39;商品id&#39;,</span><br><span class="line">  &#96;order_id&#96; bigint COMMENT &#39;订单编号&#39;,</span><br><span class="line">  &#96;appraise&#96; string COMMENT &#39;评价 1 好评 2 中评 3 差评&#39;,</span><br><span class="line">  &#96;comment_txt&#96; string COMMENT &#39;评价内容&#39;,</span><br><span class="line">  &#96;create_time&#96; bigint COMMENT &#39;创建时间&#39;</span><br><span class="line">  ) COMMENT &#39;商品评论表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ods&#x2F;mall&#x2F;comment_info&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>coupon_use</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ods.mall__coupon_use</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ods.mall__coupon_use&#96;(</span><br><span class="line">  &#96;id&#96; bigint COMMENT &#39;编号&#39;,</span><br><span class="line">  &#96;coupon_id&#96; bigint COMMENT &#39;购物券ID&#39;,</span><br><span class="line">  &#96;user_id&#96; bigint COMMENT &#39;用户ID&#39;,</span><br><span class="line">  &#96;order_id&#96; bigint COMMENT &#39;订单ID&#39;,</span><br><span class="line">  &#96;coupon_status&#96; string COMMENT &#39;购物券状态&#39;,</span><br><span class="line">  &#96;get_time&#96; bigint COMMENT &#39;领券时间&#39;,</span><br><span class="line">  &#96;using_time&#96; bigint COMMENT &#39;使用时间&#39;,</span><br><span class="line">  &#96;used_time&#96; bigint COMMENT &#39;过期时间&#39;</span><br><span class="line">  ) COMMENT &#39;优惠券领用表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ods&#x2F;mall&#x2F;coupon_use&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>user_info</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ods.mall__user_info</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ods.mall__user_info&#96;(</span><br><span class="line">  &#96;id&#96; bigint COMMENT &#39;编号&#39;,</span><br><span class="line">  &#96;name&#96; string COMMENT &#39;用户姓名&#39;,</span><br><span class="line">  &#96;email&#96; string COMMENT &#39;邮箱&#39;,</span><br><span class="line">  &#96;user_level&#96; string COMMENT &#39;用户级别&#39;,</span><br><span class="line">  &#96;birthday&#96; bigint COMMENT &#39;用户生日&#39;,</span><br><span class="line">  &#96;gender&#96; string COMMENT &#39;性别 M男,F女&#39;,</span><br><span class="line">  &#96;create_time&#96; bigint COMMENT &#39;创建时间&#39;,</span><br><span class="line">  &#96;operate_time&#96; bigint COMMENT &#39;修改时间&#39;</span><br><span class="line">  ) COMMENT &#39;用户表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ods&#x2F;mall&#x2F;user_info&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>order_info</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ods.mall__order_info</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ods.mall__order_info&#96;(</span><br><span class="line">  &#96;id&#96; bigint COMMENT &#39;编号&#39;,</span><br><span class="line">  &#96;final_total_amount&#96; decimal(16,2) COMMENT &#39;总金额&#39;,</span><br><span class="line">  &#96;order_status&#96; string COMMENT &#39;订单状态&#39;,</span><br><span class="line">  &#96;user_id&#96; bigint COMMENT &#39;用户id&#39;,</span><br><span class="line">  &#96;out_trade_no&#96; string COMMENT &#39;订单交易编号（第三方支付用)&#39;,</span><br><span class="line">  &#96;create_time&#96; bigint COMMENT &#39;创建时间&#39;,</span><br><span class="line">  &#96;operate_time&#96; bigint COMMENT &#39;操作时间&#39;,</span><br><span class="line">  &#96;province_id&#96; int COMMENT &#39;地区&#39;,</span><br><span class="line">  &#96;benefit_reduce_amount&#96; decimal(16,2) COMMENT &#39;优惠金额&#39;,</span><br><span class="line">  &#96;original_total_amount&#96; decimal(16,2) COMMENT &#39;原价金额&#39;,</span><br><span class="line">  &#96;feight_fee&#96; decimal(16,2) COMMENT &#39;运费&#39;</span><br><span class="line">  ) COMMENT &#39;订单表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ods&#x2F;mall&#x2F;order_info&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>start_log</p><blockquote><p>此为埋点启动日志表</p></blockquote></li></ol><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ods.mall__start_log</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ods.mall__start_log&#96;(</span><br><span class="line">  &#96;line&#96; string COMMENT &#39;启动日志&#39;</span><br><span class="line">  ) COMMENT &#39;启动日志表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ods&#x2F;mall&#x2F;start_log&#x2F;&#39;</span><br></pre></td></tr></table></figure><ol start="26"><li>event_log<blockquote><p>此为埋点事件日志表</p></blockquote></li></ol><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ods.mall__event_log</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ods.mall__event_log&#96;(</span><br><span class="line">  &#96;line&#96; string COMMENT &#39;事件日志&#39;</span><br><span class="line">  ) COMMENT &#39;事件日志表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ods&#x2F;mall&#x2F;event_log&#x2F;&#39;</span><br></pre></td></tr></table></figure><ol start="27"><li>date_info<blockquote><p>此为时间表</p></blockquote></li></ol><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ods.mall__date_info</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ods.mall__date_info&#96;(</span><br><span class="line">&#96;date_id&#96; int COMMENT &#39;日&#39;,</span><br><span class="line">&#96;week_id&#96; int COMMENT &#39;周&#39;,</span><br><span class="line">&#96;week_day&#96; int COMMENT &#39;周的第几天&#39;,</span><br><span class="line">&#96;day&#96; int COMMENT &#39;每月的第几天&#39;,</span><br><span class="line">&#96;month&#96; int COMMENT &#39;第几月&#39;,</span><br><span class="line">&#96;quarter&#96; int COMMENT &#39;第几季度&#39;,</span><br><span class="line">&#96;year&#96; int COMMENT &#39;年&#39;,</span><br><span class="line">&#96;is_workday&#96; int COMMENT &#39;是否是周末&#39;,</span><br><span class="line">&#96;holiday_id&#96; int COMMENT &#39;是否是节假日&#39;</span><br><span class="line">  ) COMMENT &#39;时间维度表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ods&#x2F;mall&#x2F;date_info&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure><p><a name="blogTitle32"></a></p><h3 id="8-2-mysql数据抽取"><a href="#8-2-mysql数据抽取" class="headerlink" title="8.2 mysql数据抽取"></a>8.2 mysql数据抽取</h3><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445550-cccdcf34-74d8-42ae-ba5e-2c3769ca3c9e.png#align=left&display=inline&height=77&margin=%5Bobject%20Object%5D&originHeight=77&originWidth=1255&size=0&status=done&style=none&width=1255" alt></p><ul><li><p>sqoop抽取脚本基础</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">mysql_db_name&#x3D;$&#123;db_name&#125;</span><br><span class="line">mysql_db_addr&#x3D;$&#123;db_addr&#125;</span><br><span class="line">mysql_db_user&#x3D;$&#123;db_user&#125;</span><br><span class="line">mysql_db_password&#x3D;$&#123;db_password&#125;</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">echo &quot;日期:&quot;$db_date</span><br><span class="line">echo &quot;mysql库名:&quot;$mysql_db_name</span><br><span class="line">import_data() &#123;</span><br><span class="line">&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;sqoop import \</span><br><span class="line">--connect jdbc:mysql:&#x2F;&#x2F;$mysql_db_addr:3306&#x2F;$mysql_db_name?tinyInt1isBit&#x3D;false \</span><br><span class="line">--username $mysql_db_user \</span><br><span class="line">--password $mysql_db_password \</span><br><span class="line">--target-dir &#x2F;origin_data&#x2F;$mysql_db_name&#x2F;$1&#x2F;$db_date \</span><br><span class="line">--delete-target-dir \</span><br><span class="line">--num-mappers 1 \</span><br><span class="line">--null-string &#39;&#39; \</span><br><span class="line">--null-non-string &#39;\\n&#39; \</span><br><span class="line">--fields-terminated-by &quot;\t&quot; \</span><br><span class="line">--query &quot;$2&quot;&#39; and $CONDITIONS;&#39; \</span><br><span class="line">--as-parquetfile </span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></li><li><p>DolphinScheduler全局参数</p></li></ul><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445578-aeef9cf0-c4cf-4a5a-91ea-02ec01c0bf61.png#align=left&display=inline&height=622&margin=%5Bobject%20Object%5D&originHeight=622&originWidth=981&size=0&status=done&style=none&width=981" alt></p><table><thead><tr><th>date</th><th>不传为昨天</th></tr></thead><tbody><tr><td>db_name</td><td>数据库名字</td></tr><tr><td>db_addr</td><td>数据库IP地址</td></tr><tr><td>db_user</td><td>数据库用户</td></tr><tr><td>db_password</td><td>数据库密码</td></tr></tbody></table><blockquote><p>元数据中数据开始日期为2020-03-15<br>如下导入数据代码片段，拼接上述的基础片段执行</p></blockquote><ul><li><p>全量表代码片段</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br></pre></td><td class="code"><pre><span class="line">import_data &quot;base_dic&quot; &quot;select</span><br><span class="line">dic_code,</span><br><span class="line">dic_name,</span><br><span class="line">parent_code,</span><br><span class="line">create_time,</span><br><span class="line">operate_time</span><br><span class="line">from base_dic</span><br><span class="line">where 1&#x3D;1&quot;</span><br><span class="line">import_data &quot;base_trademark&quot; &quot;select</span><br><span class="line">tm_id,</span><br><span class="line">tm_name</span><br><span class="line">from base_trademark</span><br><span class="line">where 1&#x3D;1&quot;</span><br><span class="line">import_data &quot;base_category3&quot; &quot;select</span><br><span class="line">id,</span><br><span class="line">name,</span><br><span class="line">category2_id</span><br><span class="line">from base_category3 where 1&#x3D;1&quot;</span><br><span class="line">import_data &quot;base_category2&quot; &quot;select</span><br><span class="line">id,</span><br><span class="line">name,</span><br><span class="line">category1_id</span><br><span class="line">from base_category2 where 1&#x3D;1&quot;</span><br><span class="line">import_data &quot;base_category1&quot; &quot;select</span><br><span class="line">id,</span><br><span class="line">name</span><br><span class="line">from base_category1 where 1&#x3D;1&quot;</span><br><span class="line">import_data &quot;activity_rule&quot; &quot;select</span><br><span class="line">id,</span><br><span class="line">activity_id,</span><br><span class="line">condition_amount,</span><br><span class="line">condition_num,</span><br><span class="line">benefit_amount,</span><br><span class="line">benefit_discount,</span><br><span class="line">benefit_level</span><br><span class="line">from activity_rule</span><br><span class="line">where 1&#x3D;1&quot;</span><br><span class="line">import_data &quot;activity_info&quot; &quot;select</span><br><span class="line">id,</span><br><span class="line">activity_name,</span><br><span class="line">activity_type,</span><br><span class="line">start_time,</span><br><span class="line">end_time,</span><br><span class="line">create_time</span><br><span class="line">from activity_info</span><br><span class="line">where 1&#x3D;1&quot;</span><br><span class="line">import_data &quot;activity_sku&quot; &quot;select</span><br><span class="line">id,</span><br><span class="line">activity_id,</span><br><span class="line">sku_id,</span><br><span class="line">create_time</span><br><span class="line">FROM</span><br><span class="line">activity_sku</span><br><span class="line">where 1&#x3D;1&quot;</span><br><span class="line">import_data &quot;cart_info&quot; &quot;select</span><br><span class="line">id,</span><br><span class="line">user_id,</span><br><span class="line">sku_id,</span><br><span class="line">cart_price,</span><br><span class="line">sku_num,</span><br><span class="line">sku_name,</span><br><span class="line">create_time,</span><br><span class="line">operate_time,</span><br><span class="line">is_ordered,</span><br><span class="line">order_time</span><br><span class="line">from cart_info</span><br><span class="line">where 1&#x3D;1&quot;</span><br><span class="line">import_data &quot;favor_info&quot; &quot;select</span><br><span class="line">id,</span><br><span class="line">user_id,</span><br><span class="line">sku_id,</span><br><span class="line">spu_id,</span><br><span class="line">is_cancel,</span><br><span class="line">create_time,</span><br><span class="line">cancel_time</span><br><span class="line">from favor_info</span><br><span class="line">where 1&#x3D;1&quot;</span><br><span class="line">import_data &quot;coupon_info&quot; &quot;select</span><br><span class="line">id,</span><br><span class="line">coupon_name,</span><br><span class="line">coupon_type,</span><br><span class="line">condition_amount,</span><br><span class="line">condition_num,</span><br><span class="line">activity_id,</span><br><span class="line">benefit_amount,</span><br><span class="line">benefit_discount,</span><br><span class="line">create_time,</span><br><span class="line">range_type,</span><br><span class="line">spu_id,</span><br><span class="line">tm_id,</span><br><span class="line">category3_id,</span><br><span class="line">limit_num,</span><br><span class="line">operate_time,</span><br><span class="line">expire_time</span><br><span class="line">from coupon_info</span><br><span class="line">where 1&#x3D;1&quot;</span><br><span class="line">import_data &quot;sku_info&quot; &quot;select</span><br><span class="line">id,</span><br><span class="line">spu_id,</span><br><span class="line">price,</span><br><span class="line">sku_name,</span><br><span class="line">sku_desc,</span><br><span class="line">weight,</span><br><span class="line">tm_id,</span><br><span class="line">category3_id,</span><br><span class="line">create_time</span><br><span class="line">from sku_info where 1&#x3D;1&quot;</span><br><span class="line">import_data &quot;spu_info&quot; &quot;select</span><br><span class="line">id,</span><br><span class="line">spu_name,</span><br><span class="line">category3_id,</span><br><span class="line">tm_id</span><br><span class="line">from spu_info</span><br><span class="line">where 1&#x3D;1&quot;</span><br></pre></td></tr></table></figure></li><li><p>特殊表代码片段</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line">import_data &quot;base_province&quot; &quot;select</span><br><span class="line">id,</span><br><span class="line">name,</span><br><span class="line">region_id,</span><br><span class="line">area_code,</span><br><span class="line">iso_code</span><br><span class="line">from base_province</span><br><span class="line">where 1&#x3D;1&quot;</span><br><span class="line">import_data &quot;base_region&quot; &quot;select</span><br><span class="line">id,</span><br><span class="line">region_name</span><br><span class="line">from base_region</span><br><span class="line">where 1&#x3D;1&quot;</span><br><span class="line">import_data &quot;date_info&quot; &quot;select</span><br><span class="line">date_id,</span><br><span class="line">week_id,</span><br><span class="line">week_day,</span><br><span class="line">day,</span><br><span class="line">month,</span><br><span class="line">quarter,</span><br><span class="line">year,</span><br><span class="line">is_workday,</span><br><span class="line">holiday_id</span><br><span class="line">from date_info</span><br><span class="line">where 1&#x3D;1&quot;</span><br></pre></td></tr></table></figure></li><li><p>增量表代码片段</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br></pre></td><td class="code"><pre><span class="line">import_data &quot;order_refund_info&quot; &quot;select</span><br><span class="line">id,</span><br><span class="line">user_id,</span><br><span class="line">order_id,</span><br><span class="line">sku_id,</span><br><span class="line">refund_type,</span><br><span class="line">refund_num,</span><br><span class="line">refund_amount,</span><br><span class="line">refund_reason_type,</span><br><span class="line">create_time</span><br><span class="line">from order_refund_info</span><br><span class="line">where</span><br><span class="line">date_format(create_time,&#39;%Y-%m-%d&#39;)&#x3D;&#39;$db_date&#39;&quot;</span><br><span class="line">import_data &quot;order_status_log&quot; &quot;select</span><br><span class="line">id,</span><br><span class="line">order_id,</span><br><span class="line">order_status,</span><br><span class="line">operate_time</span><br><span class="line">from order_status_log</span><br><span class="line">where</span><br><span class="line">date_format(operate_time,&#39;%Y-%m-%d&#39;)&#x3D;&#39;$db_date&#39;&quot;</span><br><span class="line">import_data &quot;payment_info&quot; &quot;select</span><br><span class="line">id,</span><br><span class="line">out_trade_no,</span><br><span class="line">order_id,</span><br><span class="line">user_id,</span><br><span class="line">alipay_trade_no,</span><br><span class="line">total_amount,</span><br><span class="line">subject,</span><br><span class="line">payment_type,</span><br><span class="line">payment_time</span><br><span class="line">from payment_info</span><br><span class="line">where</span><br><span class="line">DATE_FORMAT(payment_time,&#39;%Y-%m-%d&#39;)&#x3D;&#39;$db_date&#39;&quot;</span><br><span class="line">import_data &quot;order_detail&quot; &quot;select</span><br><span class="line">od.id,</span><br><span class="line">od.order_id,</span><br><span class="line">oi.user_id,</span><br><span class="line">od.sku_id,</span><br><span class="line">od.sku_name,</span><br><span class="line">od.order_price,</span><br><span class="line">od.sku_num,</span><br><span class="line">od.create_time</span><br><span class="line">from order_detail od</span><br><span class="line">join order_info oi</span><br><span class="line">on od.order_id&#x3D;oi.id</span><br><span class="line">where</span><br><span class="line">DATE_FORMAT(od.create_time,&#39;%Y-%m-%d&#39;)&#x3D;&#39;$db_date&#39;&quot;</span><br><span class="line">import_data &quot;activity_order&quot; &quot;select</span><br><span class="line">id,</span><br><span class="line">activity_id,</span><br><span class="line">order_id,</span><br><span class="line">create_time</span><br><span class="line">from activity_order</span><br><span class="line">where</span><br><span class="line">date_format(create_time,&#39;%Y-%m-%d&#39;)&#x3D;&#39;$db_date&#39;&quot;</span><br><span class="line">import_data &quot;comment_info&quot; &quot;select</span><br><span class="line">id,</span><br><span class="line">user_id,</span><br><span class="line">sku_id,</span><br><span class="line">spu_id,</span><br><span class="line">order_id,</span><br><span class="line">appraise,</span><br><span class="line">comment_txt,</span><br><span class="line">create_time</span><br><span class="line">from comment_info</span><br><span class="line">where date_format(create_time,&#39;%Y-%m-%d&#39;)&#x3D;&#39;$db_date&#39;&quot;</span><br></pre></td></tr></table></figure></li><li><p>增量及变化表代码片段</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br></pre></td><td class="code"><pre><span class="line">import_data &quot;coupon_use&quot; &quot;select</span><br><span class="line">id,</span><br><span class="line">coupon_id,</span><br><span class="line">user_id,</span><br><span class="line">order_id,</span><br><span class="line">coupon_status,</span><br><span class="line">get_time,</span><br><span class="line">using_time,</span><br><span class="line">used_time</span><br><span class="line">from coupon_use</span><br><span class="line">where (date_format(get_time,&#39;%Y-%m-%d&#39;)&#x3D;&#39;$db_date&#39;</span><br><span class="line">or date_format(using_time,&#39;%Y-%m-%d&#39;)&#x3D;&#39;$db_date&#39;</span><br><span class="line">or date_format(used_time,&#39;%Y-%m-%d&#39;)&#x3D;&#39;$db_date&#39;)&quot;</span><br><span class="line">import_data &quot;user_info&quot; &quot;select</span><br><span class="line">id,</span><br><span class="line">name,</span><br><span class="line">birthday,</span><br><span class="line">gender,</span><br><span class="line">email,</span><br><span class="line">user_level,</span><br><span class="line">create_time,</span><br><span class="line">operate_time</span><br><span class="line">from user_info</span><br><span class="line">where (DATE_FORMAT(create_time,&#39;%Y-%m-%d&#39;)&#x3D;&#39;$db_date&#39;</span><br><span class="line">or DATE_FORMAT(operate_time,&#39;%Y-%m-%d&#39;)&#x3D;&#39;$db_date&#39;)&quot;</span><br><span class="line">import_data &quot;order_info&quot; &quot;select</span><br><span class="line">id,</span><br><span class="line">final_total_amount,</span><br><span class="line">order_status,</span><br><span class="line">user_id,</span><br><span class="line">out_trade_no,</span><br><span class="line">create_time,</span><br><span class="line">operate_time,</span><br><span class="line">province_id,</span><br><span class="line">benefit_reduce_amount,</span><br><span class="line">original_total_amount,</span><br><span class="line">feight_fee</span><br><span class="line">from order_info</span><br><span class="line">where (date_format(create_time,&#39;%Y-%m-%d&#39;)&#x3D;&#39;$db_date&#39;</span><br><span class="line">or date_format(operate_time,&#39;%Y-%m-%d&#39;)&#x3D;&#39;$db_date&#39;)&quot;</span><br></pre></td></tr></table></figure><p><a name="blogTitle33"></a></p><h3 id="8-3-ods层数据加载"><a href="#8-3-ods层数据加载" class="headerlink" title="8.3 ods层数据加载"></a>8.3 ods层数据加载</h3></li><li><p>脚本修改$table_name即可</p><blockquote><p>注意2张埋点日志表的数据导出目录</p></blockquote></li></ul><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;ods</span><br><span class="line">table_name&#x3D;base_dic</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">load data inpath &#39;&#x2F;origin_data&#x2F;$APP1&#x2F;$table_name&#x2F;$db_date&#39; OVERWRITE into table $hive_table_name partition(dt&#x3D;&#39;$db_date&#39;);</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="blogTitle34"></a></p><h2 id="9-dwd层构建"><a href="#9-dwd层构建" class="headerlink" title="9 dwd层构建"></a>9 dwd层构建</h2><p><a name="blogTitle35"></a></p><h3 id="9-1-dwd层构建（启动-事件日志）"><a href="#9-1-dwd层构建（启动-事件日志）" class="headerlink" title="9.1 dwd层构建（启动-事件日志）"></a>9.1 dwd层构建（启动-事件日志）</h3><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445446614-4652a8e0-0b0a-4e2a-910b-e90751aed058.png#align=left&display=inline&height=446&margin=%5Bobject%20Object%5D&originHeight=446&originWidth=951&size=0&status=done&style=none&width=951" alt><br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445749-02840e2e-dcb1-4a8a-8778-e6ee6b650936.png#align=left&display=inline&height=614&margin=%5Bobject%20Object%5D&originHeight=614&originWidth=1477&size=0&status=done&style=none&width=1477" alt><br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445567-5b7c6d42-4d24-4807-afb3-cdea023f4f90.png#align=left&display=inline&height=477&margin=%5Bobject%20Object%5D&originHeight=477&originWidth=951&size=0&status=done&style=none&width=951" alt><br><a name="65a81e92"></a></p><h4 id="9-1-1-启动日志表"><a href="#9-1-1-启动日志表" class="headerlink" title="9.1.1 启动日志表"></a>9.1.1 启动日志表</h4><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445612-d6f7562c-f0bb-430f-b4af-7b42344e971f.png#align=left&display=inline&height=730&margin=%5Bobject%20Object%5D&originHeight=730&originWidth=1437&size=0&status=done&style=none&width=1437" alt></p><ul><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwd.mall__start_log</span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwd.mall__start_log&#96;(</span><br><span class="line">  &#96;mid_id&#96; string COMMENT &#39;设备唯一标识&#39;,</span><br><span class="line">  &#96;user_id&#96; string COMMENT &#39;用户标识&#39;,</span><br><span class="line">  &#96;version_code&#96; string COMMENT &#39;程序版本号&#39;,</span><br><span class="line">  &#96;version_name&#96; string COMMENT &#39;程序版本名&#39;,</span><br><span class="line">  &#96;lang&#96; string COMMENT &#39;系统语言&#39;,</span><br><span class="line">  &#96;source&#96; string COMMENT &#39;渠道号&#39;,</span><br><span class="line">  &#96;os&#96; string COMMENT &#39;系统版本&#39;,</span><br><span class="line">  &#96;area&#96; string COMMENT &#39;区域&#39;,</span><br><span class="line">  &#96;model&#96; string COMMENT &#39;手机型号&#39;,</span><br><span class="line">  &#96;brand&#96; string COMMENT &#39;手机品牌&#39;,</span><br><span class="line">  &#96;sdk_version&#96; string COMMENT &#39;sdkVersion&#39;,</span><br><span class="line">  &#96;gmail&#96; string COMMENT &#39;gmail&#39;,</span><br><span class="line">  &#96;height_width&#96; string COMMENT &#39;屏幕宽高&#39;,</span><br><span class="line">  &#96;app_time&#96; string COMMENT &#39;客户端日志产生时的时间&#39;,</span><br><span class="line">  &#96;network&#96; string COMMENT &#39;网络模式&#39;,</span><br><span class="line">  &#96;lng&#96; string COMMENT &#39;经度&#39;,</span><br><span class="line">  &#96;lat&#96; string COMMENT &#39;纬度&#39;,</span><br><span class="line">  &#96;entry&#96; string COMMENT &#39;入口: push&#x3D;1,widget&#x3D;2,icon&#x3D;3,notification&#x3D;4,lockscreen_widget&#x3D;5&#39;,</span><br><span class="line">  &#96;open_ad_type&#96; string COMMENT &#39;开屏广告类型: 开屏原生广告&#x3D;1, 开屏插屏广告&#x3D;2&#39;,</span><br><span class="line">  &#96;action&#96; string COMMENT &#39;状态：成功&#x3D;1 失败&#x3D;2&#39;,</span><br><span class="line">  &#96;loading_time&#96; string COMMENT &#39;加载时长&#39;,</span><br><span class="line">  &#96;detail&#96; string COMMENT &#39;失败码&#39;,</span><br><span class="line">  &#96;extend1&#96; string COMMENT &#39;失败的 message&#39;</span><br><span class="line">  ) COMMENT &#39;启动日志表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwd&#x2F;mall&#x2F;start_log&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>数据导入</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwd</span><br><span class="line">APP3&#x3D;ods</span><br><span class="line">table_name&#x3D;start_log</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line">hive_origin_table_name&#x3D;$APP3.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">PARTITION (dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">get_json_object(line,&#39;$.mid&#39;) mid_id,</span><br><span class="line">get_json_object(line,&#39;$.uid&#39;) user_id,</span><br><span class="line">get_json_object(line,&#39;$.vc&#39;) version_code,</span><br><span class="line">get_json_object(line,&#39;$.vn&#39;) version_name,</span><br><span class="line">get_json_object(line,&#39;$.l&#39;) lang,</span><br><span class="line">get_json_object(line,&#39;$.sr&#39;) source,</span><br><span class="line">get_json_object(line,&#39;$.os&#39;) os,</span><br><span class="line">get_json_object(line,&#39;$.ar&#39;) area,</span><br><span class="line">get_json_object(line,&#39;$.md&#39;) model,</span><br><span class="line">get_json_object(line,&#39;$.ba&#39;) brand,</span><br><span class="line">get_json_object(line,&#39;$.sv&#39;) sdk_version,</span><br><span class="line">get_json_object(line,&#39;$.g&#39;) gmail,</span><br><span class="line">get_json_object(line,&#39;$.hw&#39;) height_width,</span><br><span class="line">get_json_object(line,&#39;$.t&#39;) app_time,</span><br><span class="line">get_json_object(line,&#39;$.nw&#39;) network,</span><br><span class="line">get_json_object(line,&#39;$.ln&#39;) lng,</span><br><span class="line">get_json_object(line,&#39;$.la&#39;) lat,</span><br><span class="line">get_json_object(line,&#39;$.entry&#39;) entry,</span><br><span class="line">get_json_object(line,&#39;$.open_ad_type&#39;) open_ad_type,</span><br><span class="line">get_json_object(line,&#39;$.action&#39;) action,</span><br><span class="line">get_json_object(line,&#39;$.loading_time&#39;) loading_time,</span><br><span class="line">get_json_object(line,&#39;$.detail&#39;) detail,</span><br><span class="line">get_json_object(line,&#39;$.extend1&#39;) extend1</span><br><span class="line">from $hive_origin_table_name</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="396f2a2b"></a></p><h4 id="9-1-2-事件日志表"><a href="#9-1-2-事件日志表" class="headerlink" title="9.1.2 事件日志表"></a>9.1.2 事件日志表</h4><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445586-766cb2b8-d761-4c9a-a289-16f60a99ddcd.png#align=left&display=inline&height=477&margin=%5Bobject%20Object%5D&originHeight=477&originWidth=951&size=0&status=done&style=none&width=951" alt><br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445702-d1a78ffa-edf3-4d58-a3bb-dab7ac195815.png#align=left&display=inline&height=604&margin=%5Bobject%20Object%5D&originHeight=604&originWidth=1333&size=0&status=done&style=none&width=1333" alt><br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445677-bc019011-2053-412d-b4c8-bfd856f29547.png#align=left&display=inline&height=603&margin=%5Bobject%20Object%5D&originHeight=603&originWidth=1333&size=0&status=done&style=none&width=1333" alt><br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445698-37476ac6-25f3-4cea-86ac-f907a3cb1d1d.png#align=left&display=inline&height=621&margin=%5Bobject%20Object%5D&originHeight=621&originWidth=1376&size=0&status=done&style=none&width=1376" alt><br></p></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwd.mall__event_log</span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwd.mall__event_log&#96;(</span><br><span class="line">  &#96;mid_id&#96; string COMMENT &#39;设备唯一标识&#39;,</span><br><span class="line">  &#96;user_id&#96; string COMMENT &#39;用户标识&#39;,</span><br><span class="line">  &#96;version_code&#96; string COMMENT &#39;程序版本号&#39;,</span><br><span class="line">  &#96;version_name&#96; string COMMENT &#39;程序版本名&#39;,</span><br><span class="line">  &#96;lang&#96; string COMMENT &#39;系统语言&#39;,</span><br><span class="line">  &#96;source&#96; string COMMENT &#39;渠道号&#39;,</span><br><span class="line">  &#96;os&#96; string COMMENT &#39;系统版本&#39;,</span><br><span class="line">  &#96;area&#96; string COMMENT &#39;区域&#39;,</span><br><span class="line">  &#96;model&#96; string COMMENT &#39;手机型号&#39;,</span><br><span class="line">  &#96;brand&#96; string COMMENT &#39;手机品牌&#39;,</span><br><span class="line">  &#96;sdk_version&#96; string COMMENT &#39;sdkVersion&#39;,</span><br><span class="line">  &#96;gmail&#96; string COMMENT &#39;gmail&#39;,</span><br><span class="line">  &#96;height_width&#96; string COMMENT &#39;屏幕宽高&#39;,</span><br><span class="line">  &#96;app_time&#96; string COMMENT &#39;客户端日志产生时的时间&#39;,</span><br><span class="line">  &#96;network&#96; string COMMENT &#39;网络模式&#39;,</span><br><span class="line">  &#96;lng&#96; string COMMENT &#39;经度&#39;,</span><br><span class="line">  &#96;lat&#96; string COMMENT &#39;纬度&#39;,</span><br><span class="line">  &#96;event_name&#96; string COMMENT &#39;事件名称&#39;,</span><br><span class="line">  &#96;event_json&#96; string COMMENT &#39;事件详情&#39;,</span><br><span class="line">  &#96;server_time&#96; string COMMENT &#39;服务器时间&#39;</span><br><span class="line">  ) COMMENT &#39;事件日志表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwd&#x2F;mall&#x2F;event_log&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure><p><a name="3a5115eb"></a></p><h5 id="9-2-1-制作-UDF-UDTF"><a href="#9-2-1-制作-UDF-UDTF" class="headerlink" title="9.2.1 制作 UDF UDTF"></a>9.2.1 制作 UDF UDTF</h5></li><li><p>udf</p></li></ul><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445696-3b581062-c5d4-427f-ba12-e6867d59b49c.png#align=left&display=inline&height=417&margin=%5Bobject%20Object%5D&originHeight=417&originWidth=1318&size=0&status=done&style=none&width=1318" alt><br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445883-e8c65772-697b-4f7a-9dd5-d454e8615019.png#align=left&display=inline&height=534&margin=%5Bobject%20Object%5D&originHeight=534&originWidth=1315&size=0&status=done&style=none&width=1315" alt></p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br></pre></td><td class="code"><pre><span class="line">import org.apache.commons.lang.StringUtils;</span><br><span class="line">import org.apache.hadoop.hive.ql.exec.UDF;</span><br><span class="line">import org.json.JSONException;</span><br><span class="line">import org.json.JSONObject;</span><br><span class="line">public class BaseFieldUDF extends UDF &#123;</span><br><span class="line">    public String evaluate(String line, String key) throws JSONException &#123;</span><br><span class="line">        String[] log &#x3D; line.split(&quot;\\|&quot;);</span><br><span class="line">        if (log.length !&#x3D; 2 || StringUtils.isBlank(log[1])) &#123;</span><br><span class="line">            return &quot;&quot;;</span><br><span class="line">        &#125;</span><br><span class="line">        JSONObject baseJson &#x3D; new JSONObject(log[1].trim());</span><br><span class="line">        String result &#x3D; &quot;&quot;;</span><br><span class="line">        &#x2F;&#x2F; 获取服务器时间</span><br><span class="line">        if (&quot;st&quot;.equals(key)) &#123;</span><br><span class="line">            result &#x3D; log[0].trim();</span><br><span class="line">        &#125; else if (&quot;et&quot;.equals(key)) &#123;</span><br><span class="line">        &#x2F;&#x2F; 获取事件数组</span><br><span class="line">            if (baseJson.has(&quot;et&quot;)) &#123;</span><br><span class="line">                result &#x3D; baseJson.getString(&quot;et&quot;);</span><br><span class="line">            &#125;</span><br><span class="line">        &#125; else &#123;</span><br><span class="line">            JSONObject cm &#x3D; baseJson.getJSONObject(&quot;cm&quot;);</span><br><span class="line">        &#x2F;&#x2F; 获取 key 对应公共字段的 value</span><br><span class="line">            if (cm.has(key)) &#123;</span><br><span class="line">                result &#x3D; cm.getString(key);</span><br><span class="line">            &#125;</span><br><span class="line">        &#125;</span><br><span class="line">        return result;</span><br><span class="line">    &#125;</span><br><span class="line">    public static void main(String[] args) throws JSONException &#123;</span><br><span class="line">        String line &#x3D; &quot;         1588319303710|&#123;\n&quot; +</span><br><span class="line">                &quot;        \&quot;cm\&quot;:&#123;\n&quot; +</span><br><span class="line">                &quot;            \&quot;ln\&quot;:\&quot;-51.5\&quot;,\&quot;sv\&quot;:\&quot;V2.0.7\&quot;,\&quot;os\&quot;:\&quot;8.0.8\&quot;,\&quot;g\&quot;:\&quot;L1470998@gmail.com\&quot;,\&quot;mid\&quot;:\&quot;13\&quot;,\n&quot; +</span><br><span class="line">                &quot;                    \&quot;nw\&quot;:\&quot;4G\&quot;,\&quot;l\&quot;:\&quot;en\&quot;,\&quot;vc\&quot;:\&quot;7\&quot;,\&quot;hw\&quot;:\&quot;640*960\&quot;,\&quot;ar\&quot;:\&quot;MX\&quot;,\&quot;uid\&quot;:\&quot;13\&quot;,\&quot;t\&quot;:\&quot;1588291826938\&quot;,\n&quot; +</span><br><span class="line">                &quot;                    \&quot;la\&quot;:\&quot;-38.2\&quot;,\&quot;md\&quot;:\&quot;Huawei-14\&quot;,\&quot;vn\&quot;:\&quot;1.3.6\&quot;,\&quot;ba\&quot;:\&quot;Huawei\&quot;,\&quot;sr\&quot;:\&quot;Y\&quot;\n&quot; +</span><br><span class="line">                &quot;        &#125;,\n&quot; +</span><br><span class="line">                &quot;        \&quot;ap\&quot;:\&quot;app\&quot;,\n&quot; +</span><br><span class="line">                &quot;                \&quot;et\&quot;:[&#123;\n&quot; +</span><br><span class="line">                &quot;            \&quot;ett\&quot;:\&quot;1588228193191\&quot;,\&quot;en\&quot;:\&quot;ad\&quot;,\&quot;kv\&quot;:&#123;\&quot;activityId\&quot;:\&quot;1\&quot;,\&quot;displayMills\&quot;:\&quot;113201\&quot;,\&quot;entry\&quot;:\&quot;3\&quot;,\&quot;action\&quot;:\&quot;5\&quot;,\&quot;contentType\&quot;:\&quot;0\&quot;&#125;\n&quot; +</span><br><span class="line">                &quot;        &#125;,&#123;\n&quot; +</span><br><span class="line">                &quot;            \&quot;ett\&quot;:\&quot;1588300304713\&quot;,\&quot;en\&quot;:\&quot;notification\&quot;,\&quot;kv\&quot;:&#123;\&quot;ap_time\&quot;:\&quot;1588277440794\&quot;,\&quot;action\&quot;:\&quot;2\&quot;,\&quot;type\&quot;:\&quot;3\&quot;,\&quot;content\&quot;:\&quot;\&quot;&#125;\n&quot; +</span><br><span class="line">                &quot;        &#125;,&#123;\n&quot; +</span><br><span class="line">                &quot;            \&quot;ett\&quot;:\&quot;1588249203743\&quot;,\&quot;en\&quot;:\&quot;active_background\&quot;,\&quot;kv\&quot;:&#123;\&quot;active_source\&quot;:\&quot;3\&quot;&#125;\n&quot; +</span><br><span class="line">                &quot;        &#125;,&#123;\n&quot; +</span><br><span class="line">                &quot;            \&quot;ett\&quot;:\&quot;1588225856101\&quot;,\&quot;en\&quot;:\&quot;comment\&quot;,\&quot;kv\&quot;:&#123;\&quot;p_comment_id\&quot;:0,\&quot;addtime\&quot;:\&quot;1588263895040\&quot;,\&quot;praise_count\&quot;:231,\&quot;other_id\&quot;:5,\&quot;comment_id\&quot;:5,\&quot;reply_count\&quot;:62,\&quot;userid\&quot;:7,\&quot;content\&quot;:\&quot;骸汞\&quot;&#125;\n&quot; +</span><br><span class="line">                &quot;        &#125;,&#123;\n&quot; +</span><br><span class="line">                &quot;            \&quot;ett\&quot;:\&quot;1588254200122\&quot;,\&quot;en\&quot;:\&quot;favorites\&quot;,\&quot;kv\&quot;:&#123;\&quot;course_id\&quot;:5,\&quot;id\&quot;:0,\&quot;add_time\&quot;:\&quot;1588264138625\&quot;,\&quot;userid\&quot;:0&#125;\n&quot; +</span><br><span class="line">                &quot;        &#125;,&#123;\n&quot; +</span><br><span class="line">                &quot;            \&quot;ett\&quot;:\&quot;1588281152824\&quot;,\&quot;en\&quot;:\&quot;praise\&quot;,\&quot;kv\&quot;:&#123;\&quot;target_id\&quot;:4,\&quot;id\&quot;:3,\&quot;type\&quot;:3,\&quot;add_time\&quot;:\&quot;1588307696417\&quot;,\&quot;userid\&quot;:8&#125;\n&quot; +</span><br><span class="line">                &quot;        &#125;]\n&quot; +</span><br><span class="line">                &quot;    &#125;&quot;;</span><br><span class="line">        String s &#x3D; new BaseFieldUDF().evaluate(line, &quot;mid&quot;);</span><br><span class="line">        String ss &#x3D; new BaseFieldUDF().evaluate(line, &quot;st&quot;);</span><br><span class="line">        String sss &#x3D; new BaseFieldUDF().evaluate(line, &quot;et&quot;);</span><br><span class="line">        System.out.println(s);</span><br><span class="line">        System.out.println(ss);</span><br><span class="line">        System.out.println(sss);</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><blockquote><p>结果：<br>13<br>1588319303710<br>[{“ett”:”1588228193191”,”en”:”ad”,”kv”:{“activityId”:”1”,”displayMills”:”113201”,”entry”:”3”,”action”:”5”,”contentType”:”0”}},{“ett”:”1588300304713”,”en”:”notification”,”kv”:{“ap_time”:”1588277440794”,”action”:”2”,”type”:”3”,”content”:””}},{“ett”:”1588249203743”,”en”:”active_background”,”kv”:{“active_source”:”3”}},{“ett”:”1588225856101”,”en”:”comment”,”kv”:{“p_comment_id”:0,”addtime”:”1588263895040”,”praise_count”:231,”other_id”:5,”comment_id”:5,”reply_count”:62,”userid”:7,”content”:”骸汞”}},{“ett”:”1588254200122”,”en”:”favorites”,”kv”:{“course_id”:5,”id”:0,”add_time”:”1588264138625”,”userid”:0}},{“ett”:”1588281152824”,”en”:”praise”,”kv”:{“target_id”:4,”id”:3,”type”:3,”add_time”:”1588307696417”,”userid”:8}}]</p></blockquote><ul><li>udtf</li></ul><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445446239-9b99f151-4d88-4505-ab08-fb212a6d8ac7.png#align=left&display=inline&height=339&margin=%5Bobject%20Object%5D&originHeight=339&originWidth=1207&size=0&status=done&style=none&width=1207" alt><br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445446059-b20b058a-4006-4c97-bce2-84b85cd28599.png#align=left&display=inline&height=624&margin=%5Bobject%20Object%5D&originHeight=624&originWidth=1205&size=0&status=done&style=none&width=1205" alt></p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br></pre></td><td class="code"><pre><span class="line">import org.apache.commons.lang.StringUtils;</span><br><span class="line">import org.apache.hadoop.hive.ql.exec.UDFArgumentException;</span><br><span class="line">import org.apache.hadoop.hive.ql.metadata.HiveException;</span><br><span class="line">import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;</span><br><span class="line">import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;</span><br><span class="line">import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;</span><br><span class="line">import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;</span><br><span class="line">import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;</span><br><span class="line">import org.json.JSONArray;</span><br><span class="line">import org.json.JSONException;</span><br><span class="line">import java.util.ArrayList;</span><br><span class="line">public class EventJsonUDTF extends GenericUDTF &#123;</span><br><span class="line">    &#x2F;&#x2F;该方法中，我们将指定输出参数的名称和参数类型：</span><br><span class="line">    public StructObjectInspector initialize(StructObjectInspector argOIs) throws UDFArgumentException &#123;</span><br><span class="line">        ArrayList&lt;String&gt; fieldNames &#x3D; new ArrayList&lt;String&gt;();</span><br><span class="line">        ArrayList&lt;ObjectInspector&gt; fieldOIs &#x3D; new ArrayList&lt;ObjectInspector&gt;();</span><br><span class="line">        fieldNames.add(&quot;event_name&quot;);</span><br><span class="line">        fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);</span><br><span class="line">        fieldNames.add(&quot;event_json&quot;);</span><br><span class="line">        fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);</span><br><span class="line">        return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,</span><br><span class="line">                fieldOIs);</span><br><span class="line">    &#125;</span><br><span class="line">    &#x2F;&#x2F;输入 1 条记录，输出若干条结果</span><br><span class="line">    @Override</span><br><span class="line">    public void process(Object[] objects) throws HiveException &#123;</span><br><span class="line">        &#x2F;&#x2F; 获取传入的 et</span><br><span class="line">        String input &#x3D; objects[0].toString();</span><br><span class="line">        &#x2F;&#x2F; 如果传进来的数据为空，直接返回过滤掉该数据</span><br><span class="line">        if (StringUtils.isBlank(input)) &#123;</span><br><span class="line">            return;</span><br><span class="line">        &#125; else &#123;</span><br><span class="line">            try &#123;</span><br><span class="line">                &#x2F;&#x2F; 获取一共有几个事件（ad&#x2F;facoriters）</span><br><span class="line">                JSONArray ja &#x3D; new JSONArray(input);</span><br><span class="line">                if (ja &#x3D;&#x3D; null)</span><br><span class="line">                    return;</span><br><span class="line">                &#x2F;&#x2F; 循环遍历每一个事件</span><br><span class="line">                for (int i &#x3D; 0; i &lt; ja.length(); i++) &#123;</span><br><span class="line">                    String[] result &#x3D; new String[2];</span><br><span class="line">                    try &#123;</span><br><span class="line">                        &#x2F;&#x2F; 取出每个的事件名称（ad&#x2F;facoriters）</span><br><span class="line">                        result[0] &#x3D; ja.getJSONObject(i).getString(&quot;en&quot;);</span><br><span class="line">                        &#x2F;&#x2F; 取出每一个事件整体</span><br><span class="line">                        result[1] &#x3D; ja.getString(i);</span><br><span class="line">                    &#125; catch (JSONException e) &#123;</span><br><span class="line">                        continue;</span><br><span class="line">                    &#125;</span><br><span class="line">                    &#x2F;&#x2F; 将结果返回</span><br><span class="line">                    forward(result);</span><br><span class="line">                &#125;</span><br><span class="line">            &#125; catch (JSONException e) &#123;</span><br><span class="line">                e.printStackTrace();</span><br><span class="line">            &#125;</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    &#x2F;&#x2F;当没有记录处理的时候该方法会被调用，用来清理代码或者产生额外的输出</span><br><span class="line">    @Override</span><br><span class="line">    public void close() throws HiveException &#123;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><a name="86de846c"></a></p><h5 id="9-1-2-2-直接永久使用UDF"><a href="#9-1-2-2-直接永久使用UDF" class="headerlink" title="9.1.2.2 直接永久使用UDF"></a>9.1.2.2 直接永久使用UDF</h5><ul><li>上传UDF资源<blockquote><p>将hive-function-1.0-SNAPSHOT包传到HDFS 的/user/hive/jars下</p></blockquote></li></ul><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">hadoop dfs -mkdir  &#x2F;user&#x2F;hive&#x2F;jars</span><br><span class="line">hadoop dfs -put hive-function-1.0-SNAPSHOT.jar &#x2F;user&#x2F;hive&#x2F;jars&#x2F;hive-function-1.0-SNAPSHOT.jar</span><br></pre></td></tr></table></figure><blockquote><p>在hive中创建永久UDF</p></blockquote><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">create function base_analizer as &#39;com.heaton.bigdata.udf.BaseFieldUDF&#39; using jar &#39;hdfs:&#x2F;&#x2F;cdh01.cm:8020&#x2F;user&#x2F;hive&#x2F;jars&#x2F;hive-function-1.0-SNAPSHOT.jar&#39;;</span><br><span class="line">create function flat_analizer as &#39;com.heaton.bigdata.udtf.EventJsonUDTF&#39; using jar &#39;hdfs:&#x2F;&#x2F;cdh01.cm:8020&#x2F;user&#x2F;hive&#x2F;jars&#x2F;hive-function-1.0-SNAPSHOT.jar&#39;;</span><br></pre></td></tr></table></figure><p><a name="620b459a"></a></p><h5 id="9-1-2-3-Dolphin使用方式UDF"><a href="#9-1-2-3-Dolphin使用方式UDF" class="headerlink" title="9.1.2.3 Dolphin使用方式UDF"></a>9.1.2.3 Dolphin使用方式UDF</h5><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445818-d1c8fc14-e529-4591-805f-7345bab479e8.png#align=left&display=inline&height=330&margin=%5Bobject%20Object%5D&originHeight=330&originWidth=1906&size=0&status=done&style=none&width=1906" alt><br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445645-ab931a0b-bb10-4b9a-9c67-c42afdb4132d.png#align=left&display=inline&height=334&margin=%5Bobject%20Object%5D&originHeight=334&originWidth=1722&size=0&status=done&style=none&width=1722" alt></p><blockquote><p>在DAG图创建SQL工具中选择对应UDF函数即可使用，但是目前Dolphin1.2.0中关联函数操作保存无效。<br>大家可以使用UDF管理功能将JAR传入到HDFS上，这样通过脚本加入临时函数，也可以很好的完成功能。<br>临时函数语句：</p></blockquote><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">create temporary function base_analizer as &#39;com.heaton.bigdata.udf.BaseFieldUDF&#39; using jar &#39;hdfs:&#x2F;&#x2F;cdh01.cm:8020&#x2F;dolphinscheduler&#x2F;dolphinscheduler&#x2F;udfs&#x2F;hive-function-1.0-SNAPSHOT.jar&#39;;</span><br><span class="line">create temporary function flat_analizer as &#39;com.heaton.bigdata.udtf.EventJsonUDTF&#39; using jar &#39;hdfs:&#x2F;&#x2F;cdh01.cm:8020&#x2F;dolphinscheduler&#x2F;dolphinscheduler&#x2F;udfs&#x2F;hive-function-1.0-SNAPSHOT.jar&#39;;</span><br></pre></td></tr></table></figure><p><a name="29e0a488"></a></p><h5 id="9-2-4-数据导入"><a href="#9-2-4-数据导入" class="headerlink" title="9.2.4 数据导入"></a>9.2.4 数据导入</h5><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwd</span><br><span class="line">APP3&#x3D;ods</span><br><span class="line">table_name&#x3D;event_log</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line">hive_origin_table_name&#x3D;$APP3.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">PARTITION (dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">base_analizer(line,&#39;mid&#39;) as mid_id,</span><br><span class="line">base_analizer(line,&#39;uid&#39;) as user_id,</span><br><span class="line">base_analizer(line,&#39;vc&#39;) as version_code,</span><br><span class="line">base_analizer(line,&#39;vn&#39;) as version_name,</span><br><span class="line">base_analizer(line,&#39;l&#39;) as lang,</span><br><span class="line">base_analizer(line,&#39;sr&#39;) as source,</span><br><span class="line">base_analizer(line,&#39;os&#39;) as os,</span><br><span class="line">base_analizer(line,&#39;ar&#39;) as area,</span><br><span class="line">base_analizer(line,&#39;md&#39;) as model,</span><br><span class="line">base_analizer(line,&#39;ba&#39;) as brand,</span><br><span class="line">base_analizer(line,&#39;sv&#39;) as sdk_version,</span><br><span class="line">base_analizer(line,&#39;g&#39;) as gmail,</span><br><span class="line">base_analizer(line,&#39;hw&#39;) as height_width,</span><br><span class="line">base_analizer(line,&#39;t&#39;) as app_time,</span><br><span class="line">base_analizer(line,&#39;nw&#39;) as network,</span><br><span class="line">base_analizer(line,&#39;ln&#39;) as lng,</span><br><span class="line">base_analizer(line,&#39;la&#39;) as lat,</span><br><span class="line">event_name,</span><br><span class="line">event_json,</span><br><span class="line">base_analizer(line,&#39;st&#39;) as server_time</span><br><span class="line">from $hive_origin_table_name lateral view flat_analizer(base_analizer(line,&#39;et&#39;)) tmp_flat as event_name,event_json</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39; and base_analizer(line,&#39;et&#39;)&lt;&gt;&#39;&#39;;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="44c27f4d"></a></p><h4 id="9-1-3-商品点击表"><a href="#9-1-3-商品点击表" class="headerlink" title="9.1.3 商品点击表"></a>9.1.3 商品点击表</h4><ul><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwd.mall__display_log</span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwd.mall__display_log&#96;(</span><br><span class="line">&#96;mid_id&#96; string,</span><br><span class="line">&#96;user_id&#96; string,</span><br><span class="line">&#96;version_code&#96; string,</span><br><span class="line">&#96;version_name&#96; string,</span><br><span class="line">&#96;lang&#96; string,</span><br><span class="line">&#96;source&#96; string,</span><br><span class="line">&#96;os&#96; string,</span><br><span class="line">&#96;area&#96; string,</span><br><span class="line">&#96;model&#96; string,</span><br><span class="line">&#96;brand&#96; string,</span><br><span class="line">&#96;sdk_version&#96; string,</span><br><span class="line">&#96;gmail&#96; string,</span><br><span class="line">&#96;height_width&#96; string,</span><br><span class="line">&#96;app_time&#96; string,</span><br><span class="line">&#96;network&#96; string,</span><br><span class="line">&#96;lng&#96; string,</span><br><span class="line">&#96;lat&#96; string,</span><br><span class="line">&#96;action&#96; string,</span><br><span class="line">&#96;goodsid&#96; string,</span><br><span class="line">&#96;place&#96; string,</span><br><span class="line">&#96;extend1&#96; string,</span><br><span class="line">&#96;category&#96; string,</span><br><span class="line">&#96;server_time&#96; string</span><br><span class="line">  ) COMMENT &#39;商品点击表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwd&#x2F;mall&#x2F;display_log&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>数据导入</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwd</span><br><span class="line">APP3&#x3D;ods</span><br><span class="line">table_name&#x3D;display_log</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line">hive_origin_table_name&#x3D;$APP3.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">PARTITION (dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">mid_id,</span><br><span class="line">user_id,</span><br><span class="line">version_code,</span><br><span class="line">version_name,</span><br><span class="line">lang,</span><br><span class="line">source,</span><br><span class="line">os,</span><br><span class="line">area,</span><br><span class="line">model,</span><br><span class="line">brand,</span><br><span class="line">sdk_version,</span><br><span class="line">gmail,</span><br><span class="line">height_width,</span><br><span class="line">app_time,</span><br><span class="line">network,</span><br><span class="line">lng,</span><br><span class="line">lat,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.action&#39;) action,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.goodsid&#39;) goodsid,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.place&#39;) place,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.extend1&#39;) extend1,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.category&#39;) category,</span><br><span class="line">server_time</span><br><span class="line">from dwd.mall__event_log</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39; and event_name&#x3D;&#39;display&#39;;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="9a570f18"></a></p><h4 id="9-1-4-商品列表表"><a href="#9-1-4-商品列表表" class="headerlink" title="9.1.4 商品列表表"></a>9.1.4 商品列表表</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwd.mall__loading_log</span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwd.mall__loading_log&#96;(</span><br><span class="line">&#96;mid_id&#96; string,</span><br><span class="line">&#96;user_id&#96; string,</span><br><span class="line">&#96;version_code&#96; string,</span><br><span class="line">&#96;version_name&#96; string,</span><br><span class="line">&#96;lang&#96; string,</span><br><span class="line">&#96;source&#96; string,</span><br><span class="line">&#96;os&#96; string,</span><br><span class="line">&#96;area&#96; string,</span><br><span class="line">&#96;model&#96; string,</span><br><span class="line">&#96;brand&#96; string,</span><br><span class="line">&#96;sdk_version&#96; string,</span><br><span class="line">&#96;gmail&#96; string,</span><br><span class="line">&#96;height_width&#96; string,</span><br><span class="line">&#96;app_time&#96; string,</span><br><span class="line">&#96;network&#96; string,</span><br><span class="line">&#96;lng&#96; string,</span><br><span class="line">&#96;lat&#96; string,</span><br><span class="line">&#96;action&#96; string,</span><br><span class="line">&#96;loading_time&#96; string,</span><br><span class="line">&#96;loading_way&#96; string,</span><br><span class="line">&#96;extend1&#96; string,</span><br><span class="line">&#96;extend2&#96; string,</span><br><span class="line">&#96;type&#96; string,</span><br><span class="line">&#96;type1&#96; string,</span><br><span class="line">&#96;server_time&#96; string</span><br><span class="line">  ) COMMENT &#39;商品列表表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwd&#x2F;mall&#x2F;loading_log&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>数据导入</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwd</span><br><span class="line">APP3&#x3D;ods</span><br><span class="line">table_name&#x3D;loading_log</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line">hive_origin_table_name&#x3D;$APP3.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">PARTITION (dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">mid_id,</span><br><span class="line">user_id,</span><br><span class="line">version_code,</span><br><span class="line">version_name,</span><br><span class="line">lang,</span><br><span class="line">source,</span><br><span class="line">os,</span><br><span class="line">area,</span><br><span class="line">model,</span><br><span class="line">brand,</span><br><span class="line">sdk_version,</span><br><span class="line">gmail,</span><br><span class="line">height_width,</span><br><span class="line">app_time,</span><br><span class="line">network,</span><br><span class="line">lng,</span><br><span class="line">lat,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.action&#39;) action,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.loading_time&#39;) loading_time,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.loading_way&#39;) loading_way,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.extend1&#39;) extend1,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.extend2&#39;) extend2,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.type&#39;) type,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.type1&#39;) type1,</span><br><span class="line">server_time</span><br><span class="line">from dwd.mall__event_log</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39; and event_name&#x3D;&#39;loading&#39;;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="502f9d47"></a></p><h4 id="9-1-5-广告表"><a href="#9-1-5-广告表" class="headerlink" title="9.1.5 广告表"></a>9.1.5 广告表</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwd.mall__ad_log</span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwd.mall__ad_log&#96;(</span><br><span class="line">&#96;mid_id&#96; string,</span><br><span class="line">&#96;user_id&#96; string,</span><br><span class="line">&#96;version_code&#96; string,</span><br><span class="line">&#96;version_name&#96; string,</span><br><span class="line">&#96;lang&#96; string,</span><br><span class="line">&#96;source&#96; string,</span><br><span class="line">&#96;os&#96; string,</span><br><span class="line">&#96;area&#96; string,</span><br><span class="line">&#96;model&#96; string,</span><br><span class="line">&#96;brand&#96; string,</span><br><span class="line">&#96;sdk_version&#96; string,</span><br><span class="line">&#96;gmail&#96; string,</span><br><span class="line">&#96;height_width&#96; string,</span><br><span class="line">&#96;app_time&#96; string,</span><br><span class="line">&#96;network&#96; string,</span><br><span class="line">&#96;lng&#96; string,</span><br><span class="line">&#96;lat&#96; string,</span><br><span class="line">&#96;entry&#96; string,</span><br><span class="line">&#96;action&#96; string,</span><br><span class="line">&#96;contentType&#96; string,</span><br><span class="line">&#96;displayMills&#96; string,</span><br><span class="line">&#96;itemId&#96; string,</span><br><span class="line">&#96;activityId&#96; string,</span><br><span class="line">&#96;server_time&#96; string</span><br><span class="line">  ) COMMENT &#39;广告表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwd&#x2F;mall&#x2F;ad_log&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>数据导入</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwd</span><br><span class="line">APP3&#x3D;ods</span><br><span class="line">table_name&#x3D;ad_log</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line">hive_origin_table_name&#x3D;$APP3.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">PARTITION (dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">mid_id,</span><br><span class="line">user_id,</span><br><span class="line">version_code,</span><br><span class="line">version_name,</span><br><span class="line">lang,</span><br><span class="line">source,</span><br><span class="line">os,</span><br><span class="line">area,</span><br><span class="line">model,</span><br><span class="line">brand,</span><br><span class="line">sdk_version,</span><br><span class="line">gmail,</span><br><span class="line">height_width,</span><br><span class="line">app_time,</span><br><span class="line">network,</span><br><span class="line">lng,</span><br><span class="line">lat,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.entry&#39;) entry,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.action&#39;) action,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.contentType&#39;) contentType,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.displayMills&#39;) displayMills,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.itemId&#39;) itemId,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.activityId&#39;) activityId,</span><br><span class="line">server_time</span><br><span class="line">from dwd.mall__event_log</span><br><span class="line">where dt&#x3D;&#39;db_date&#39; and event_name&#x3D;&#39;ad&#39;;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="a25a0ee7"></a></p><h4 id="9-1-6-消息通知表"><a href="#9-1-6-消息通知表" class="headerlink" title="9.1.6 消息通知表"></a>9.1.6 消息通知表</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwd.mall__notification_log</span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwd.mall__notification_log&#96;(</span><br><span class="line">&#96;mid_id&#96; string,</span><br><span class="line">&#96;user_id&#96; string,</span><br><span class="line">&#96;version_code&#96; string,</span><br><span class="line">&#96;version_name&#96; string,</span><br><span class="line">&#96;lang&#96; string,</span><br><span class="line">&#96;source&#96; string,</span><br><span class="line">&#96;os&#96; string,</span><br><span class="line">&#96;area&#96; string,</span><br><span class="line">&#96;model&#96; string,</span><br><span class="line">&#96;brand&#96; string,</span><br><span class="line">&#96;sdk_version&#96; string,</span><br><span class="line">&#96;gmail&#96; string,</span><br><span class="line">&#96;height_width&#96; string,</span><br><span class="line">&#96;app_time&#96; string,</span><br><span class="line">&#96;network&#96; string,</span><br><span class="line">&#96;lng&#96; string,</span><br><span class="line">&#96;lat&#96; string,</span><br><span class="line">&#96;action&#96; string,</span><br><span class="line">&#96;noti_type&#96; string,</span><br><span class="line">&#96;ap_time&#96; string,</span><br><span class="line">&#96;content&#96; string,</span><br><span class="line">&#96;server_time&#96; string</span><br><span class="line">  ) COMMENT &#39;消息通知表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwd&#x2F;mall&#x2F;notification_log&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>数据导入</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwd</span><br><span class="line">APP3&#x3D;ods</span><br><span class="line">table_name&#x3D;notification_log</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line">hive_origin_table_name&#x3D;$APP3.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">PARTITION (dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">mid_id,</span><br><span class="line">user_id,</span><br><span class="line">version_code,</span><br><span class="line">version_name,</span><br><span class="line">lang,</span><br><span class="line">source,</span><br><span class="line">os,</span><br><span class="line">area,</span><br><span class="line">model,</span><br><span class="line">brand,</span><br><span class="line">sdk_version,</span><br><span class="line">gmail,</span><br><span class="line">height_width,</span><br><span class="line">app_time,</span><br><span class="line">network,</span><br><span class="line">lng,</span><br><span class="line">lat,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.action&#39;) action,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.noti_type&#39;) noti_type,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.ap_time&#39;) ap_time,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.content&#39;) content,</span><br><span class="line">server_time</span><br><span class="line">from dwd.mall__event_log</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39; and event_name&#x3D;&#39;notification&#39;;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="e14d4ccf"></a></p><h4 id="9-1-7-用户后台活跃表"><a href="#9-1-7-用户后台活跃表" class="headerlink" title="9.1.7 用户后台活跃表"></a>9.1.7 用户后台活跃表</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwd.mall__active_background_log</span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwd.mall__active_background_log&#96;(</span><br><span class="line">&#96;mid_id&#96; string,</span><br><span class="line">&#96;user_id&#96; string,</span><br><span class="line">&#96;version_code&#96; string,</span><br><span class="line">&#96;version_name&#96; string,</span><br><span class="line">&#96;lang&#96; string,</span><br><span class="line">&#96;source&#96; string,</span><br><span class="line">&#96;os&#96; string,</span><br><span class="line">&#96;area&#96; string,</span><br><span class="line">&#96;model&#96; string,</span><br><span class="line">&#96;brand&#96; string,</span><br><span class="line">&#96;sdk_version&#96; string,</span><br><span class="line">&#96;gmail&#96; string,</span><br><span class="line">&#96;height_width&#96; string,</span><br><span class="line">&#96;app_time&#96; string,</span><br><span class="line">&#96;network&#96; string,</span><br><span class="line">&#96;lng&#96; string,</span><br><span class="line">&#96;lat&#96; string,</span><br><span class="line">&#96;active_source&#96; string,</span><br><span class="line">&#96;server_time&#96; string</span><br><span class="line">  ) COMMENT &#39;用户后台活跃表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwd&#x2F;mall&#x2F;active_background_log&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>数据导入</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwd</span><br><span class="line">APP3&#x3D;ods</span><br><span class="line">table_name&#x3D;active_background_log</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line">hive_origin_table_name&#x3D;$APP3.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">PARTITION (dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">mid_id,</span><br><span class="line">user_id,</span><br><span class="line">version_code,</span><br><span class="line">version_name,</span><br><span class="line">lang,</span><br><span class="line">source,</span><br><span class="line">os,</span><br><span class="line">area,</span><br><span class="line">model,</span><br><span class="line">brand,</span><br><span class="line">sdk_version,</span><br><span class="line">gmail,</span><br><span class="line">height_width,</span><br><span class="line">app_time,</span><br><span class="line">network,</span><br><span class="line">lng,</span><br><span class="line">lat,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.active_source&#39;) active_source,</span><br><span class="line">server_time</span><br><span class="line">from dwd.mall__event_log</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39; and event_name&#x3D;&#39;active_background&#39;;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="9f615d68"></a></p><h4 id="9-1-8-评论表"><a href="#9-1-8-评论表" class="headerlink" title="9.1.8 评论表"></a>9.1.8 评论表</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwd.mall__comment_log</span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwd.mall__comment_log&#96;(</span><br><span class="line">&#96;mid_id&#96; string,</span><br><span class="line">&#96;user_id&#96; string,</span><br><span class="line">&#96;version_code&#96; string,</span><br><span class="line">&#96;version_name&#96; string,</span><br><span class="line">&#96;lang&#96; string,</span><br><span class="line">&#96;source&#96; string,</span><br><span class="line">&#96;os&#96; string,</span><br><span class="line">&#96;area&#96; string,</span><br><span class="line">&#96;model&#96; string,</span><br><span class="line">&#96;brand&#96; string,</span><br><span class="line">&#96;sdk_version&#96; string,</span><br><span class="line">&#96;gmail&#96; string,</span><br><span class="line">&#96;height_width&#96; string,</span><br><span class="line">&#96;app_time&#96; string,</span><br><span class="line">&#96;network&#96; string,</span><br><span class="line">&#96;lng&#96; string,</span><br><span class="line">&#96;lat&#96; string,</span><br><span class="line">&#96;comment_id&#96; int,</span><br><span class="line">&#96;userid&#96; int,</span><br><span class="line">&#96;p_comment_id&#96; int,</span><br><span class="line">&#96;content&#96; string,</span><br><span class="line">&#96;addtime&#96; string,</span><br><span class="line">&#96;other_id&#96; int,</span><br><span class="line">&#96;praise_count&#96; int,</span><br><span class="line">&#96;reply_count&#96; int,</span><br><span class="line">&#96;server_time&#96; string</span><br><span class="line">  ) COMMENT &#39;评论表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwd&#x2F;mall&#x2F;comment_log&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>数据导入</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwd</span><br><span class="line">APP3&#x3D;ods</span><br><span class="line">table_name&#x3D;comment_log</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line">hive_origin_table_name&#x3D;$APP3.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">PARTITION (dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">mid_id,</span><br><span class="line">user_id,</span><br><span class="line">version_code,</span><br><span class="line">version_name,</span><br><span class="line">lang,</span><br><span class="line">source,</span><br><span class="line">os,</span><br><span class="line">area,</span><br><span class="line">model,</span><br><span class="line">brand,</span><br><span class="line">sdk_version,</span><br><span class="line">gmail,</span><br><span class="line">height_width,</span><br><span class="line">app_time,</span><br><span class="line">network,</span><br><span class="line">lng,</span><br><span class="line">lat,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.comment_id&#39;) comment_id,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.userid&#39;) userid,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.p_comment_id&#39;) p_comment_id,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.content&#39;) content,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.addtime&#39;) addtime,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.other_id&#39;) other_id,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.praise_count&#39;) praise_count,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.reply_count&#39;) reply_count,</span><br><span class="line">server_time</span><br><span class="line">from dwd.mall__event_log</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39; and event_name&#x3D;&#39;comment&#39;;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="6448bc13"></a></p><h4 id="9-1-9-收藏表"><a href="#9-1-9-收藏表" class="headerlink" title="9.1.9 收藏表"></a>9.1.9 收藏表</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwd.mall__favorites_log</span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwd.mall__favorites_log&#96;(</span><br><span class="line">&#96;mid_id&#96; string,</span><br><span class="line">&#96;user_id&#96; string,</span><br><span class="line">&#96;version_code&#96; string,</span><br><span class="line">&#96;version_name&#96; string,</span><br><span class="line">&#96;lang&#96; string,</span><br><span class="line">&#96;source&#96; string,</span><br><span class="line">&#96;os&#96; string,</span><br><span class="line">&#96;area&#96; string,</span><br><span class="line">&#96;model&#96; string,</span><br><span class="line">&#96;brand&#96; string,</span><br><span class="line">&#96;sdk_version&#96; string,</span><br><span class="line">&#96;gmail&#96; string,</span><br><span class="line">&#96;height_width&#96; string,</span><br><span class="line">&#96;app_time&#96; string,</span><br><span class="line">&#96;network&#96; string,</span><br><span class="line">&#96;lng&#96; string,</span><br><span class="line">&#96;lat&#96; string,</span><br><span class="line">&#96;id&#96; int,</span><br><span class="line">&#96;course_id&#96; int,</span><br><span class="line">&#96;userid&#96; int,</span><br><span class="line">&#96;add_time&#96; string,</span><br><span class="line">&#96;server_time&#96; string</span><br><span class="line">  ) COMMENT &#39;收藏表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwd&#x2F;mall&#x2F;favorites_log&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>数据导入</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwd</span><br><span class="line">APP3&#x3D;ods</span><br><span class="line">table_name&#x3D;favorites_log</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line">hive_origin_table_name&#x3D;$APP3.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">PARTITION (dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">mid_id,</span><br><span class="line">user_id,</span><br><span class="line">version_code,</span><br><span class="line">version_name,</span><br><span class="line">lang,</span><br><span class="line">source,</span><br><span class="line">os,</span><br><span class="line">area,</span><br><span class="line">model,</span><br><span class="line">brand,</span><br><span class="line">sdk_version,</span><br><span class="line">gmail,</span><br><span class="line">height_width,</span><br><span class="line">app_time,</span><br><span class="line">network,</span><br><span class="line">lng,</span><br><span class="line">lat,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.id&#39;) id,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.course_id&#39;) course_id,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.userid&#39;) userid,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.add_time&#39;) add_time,</span><br><span class="line">server_time</span><br><span class="line">from dwd.mall__event_log</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39; and event_name&#x3D;&#39;favorites&#39;;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="9790b43b"></a></p><h4 id="9-1-10-点赞表"><a href="#9-1-10-点赞表" class="headerlink" title="9.1.10 点赞表"></a>9.1.10 点赞表</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwd.mall__praise_log</span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwd.mall__praise_log&#96;(</span><br><span class="line">&#96;mid_id&#96; string,</span><br><span class="line">&#96;user_id&#96; string,</span><br><span class="line">&#96;version_code&#96; string,</span><br><span class="line">&#96;version_name&#96; string,</span><br><span class="line">&#96;lang&#96; string,</span><br><span class="line">&#96;source&#96; string,</span><br><span class="line">&#96;os&#96; string,</span><br><span class="line">&#96;area&#96; string,</span><br><span class="line">&#96;model&#96; string,</span><br><span class="line">&#96;brand&#96; string,</span><br><span class="line">&#96;sdk_version&#96; string,</span><br><span class="line">&#96;gmail&#96; string,</span><br><span class="line">&#96;height_width&#96; string,</span><br><span class="line">&#96;app_time&#96; string,</span><br><span class="line">&#96;network&#96; string,</span><br><span class="line">&#96;lng&#96; string,</span><br><span class="line">&#96;lat&#96; string,</span><br><span class="line">&#96;id&#96; string,</span><br><span class="line">&#96;userid&#96; string,</span><br><span class="line">&#96;target_id&#96; string,</span><br><span class="line">&#96;type&#96; string,</span><br><span class="line">&#96;add_time&#96; string,</span><br><span class="line">&#96;server_time&#96; string</span><br><span class="line">  ) COMMENT &#39;点赞表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwd&#x2F;mall&#x2F;praise_log&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>数据导入</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwd</span><br><span class="line">APP3&#x3D;ods</span><br><span class="line">table_name&#x3D;praise_log</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line">hive_origin_table_name&#x3D;$APP3.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">PARTITION (dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">mid_id,</span><br><span class="line">user_id,</span><br><span class="line">version_code,</span><br><span class="line">version_name,</span><br><span class="line">lang,</span><br><span class="line">source,</span><br><span class="line">os,</span><br><span class="line">area,</span><br><span class="line">model,</span><br><span class="line">brand,</span><br><span class="line">sdk_version,</span><br><span class="line">gmail,</span><br><span class="line">height_width,</span><br><span class="line">app_time,</span><br><span class="line">network,</span><br><span class="line">lng,</span><br><span class="line">lat,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.id&#39;) id,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.userid&#39;) userid,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.target_id&#39;) target_id,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.type&#39;) type,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.add_time&#39;) add_time,</span><br><span class="line">server_time</span><br><span class="line">from dwd.mall__event_log</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39; and event_name&#x3D;&#39;praise&#39;;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="ffc7fca0"></a></p><h4 id="9-1-11-错误日志表"><a href="#9-1-11-错误日志表" class="headerlink" title="9.1.11 错误日志表"></a>9.1.11 错误日志表</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwd.mall__error_log</span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwd.mall__error_log&#96;(</span><br><span class="line">&#96;mid_id&#96; string,</span><br><span class="line">&#96;user_id&#96; string,</span><br><span class="line">&#96;version_code&#96; string,</span><br><span class="line">&#96;version_name&#96; string,</span><br><span class="line">&#96;lang&#96; string,</span><br><span class="line">&#96;source&#96; string,</span><br><span class="line">&#96;os&#96; string,</span><br><span class="line">&#96;area&#96; string,</span><br><span class="line">&#96;model&#96; string,</span><br><span class="line">&#96;brand&#96; string,</span><br><span class="line">&#96;sdk_version&#96; string,</span><br><span class="line">&#96;gmail&#96; string,</span><br><span class="line">&#96;height_width&#96; string,</span><br><span class="line">&#96;app_time&#96; string,</span><br><span class="line">&#96;network&#96; string,</span><br><span class="line">&#96;lng&#96; string,</span><br><span class="line">&#96;lat&#96; string,</span><br><span class="line">&#96;errorBrief&#96; string,</span><br><span class="line">&#96;errorDetail&#96; string,</span><br><span class="line">&#96;server_time&#96; string</span><br><span class="line">  ) COMMENT &#39;错误日志表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwd&#x2F;mall&#x2F;error_log&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>数据导入</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwd</span><br><span class="line">APP3&#x3D;ods</span><br><span class="line">table_name&#x3D;error_log</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line">hive_origin_table_name&#x3D;$APP3.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">PARTITION (dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">mid_id,</span><br><span class="line">user_id,</span><br><span class="line">version_code,</span><br><span class="line">version_name,</span><br><span class="line">lang,</span><br><span class="line">source,</span><br><span class="line">os,</span><br><span class="line">area,</span><br><span class="line">model,</span><br><span class="line">brand,</span><br><span class="line">sdk_version,</span><br><span class="line">gmail,</span><br><span class="line">height_width,</span><br><span class="line">app_time,</span><br><span class="line">network,</span><br><span class="line">lng,</span><br><span class="line">lat,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.errorBrief&#39;) errorBrief,</span><br><span class="line">get_json_object(event_json,&#39;$.kv.errorDetail&#39;) errorDetail,</span><br><span class="line">server_time</span><br><span class="line">from dwd.mall__event_log</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39; and event_name&#x3D;&#39;error&#39;;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="blogTitle36"></a></p><h3 id="9-2-dwd层构建-业务库"><a href="#9-2-dwd层构建-业务库" class="headerlink" title="9.2 dwd层构建(业务库)"></a>9.2 dwd层构建(业务库)</h3><blockquote><p>此层在构建之初，增量表需要动态分区来划分时间，将数据放入指定分区</p></blockquote></li></ul><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445674-85cdb9f6-3bc5-45af-a42c-260b856251c4.png#align=left&display=inline&height=727&margin=%5Bobject%20Object%5D&originHeight=727&originWidth=1200&size=0&status=done&style=none&width=1200" alt></p><table><thead><tr><th>事实/维度</th><th>时间</th><th>用户</th><th>地区</th><th>商品</th><th>优惠卷</th><th>活动</th><th>编码</th><th>度量</th></tr></thead><tbody><tr><td>订单</td><td>√</td><td>√</td><td>√</td><td></td><td></td><td>√</td><td></td><td>件数/金额</td></tr><tr><td>订单详情</td><td>√</td><td></td><td>√</td><td>√</td><td></td><td></td><td></td><td>件数/金额</td></tr><tr><td>支付</td><td>√</td><td></td><td>√</td><td></td><td></td><td></td><td></td><td>次数/金额</td></tr><tr><td>加入购物车</td><td>√</td><td>√</td><td></td><td>√</td><td></td><td></td><td></td><td>件数/金额</td></tr><tr><td>收藏</td><td>√</td><td>√</td><td></td><td>√</td><td></td><td></td><td></td><td>个数</td></tr><tr><td>评价</td><td>√</td><td>√</td><td></td><td>√</td><td></td><td></td><td></td><td>个数</td></tr><tr><td>退款</td><td>√</td><td>√</td><td></td><td>√</td><td></td><td></td><td></td><td>件数/金额</td></tr><tr><td>优惠卷领用</td><td>√</td><td>√</td><td></td><td></td><td>√</td><td></td><td></td><td>个数</td></tr></tbody></table><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445446714-5b66a38b-d703-4552-a3b2-78e2bb1093b0.png#align=left&display=inline&height=633&margin=%5Bobject%20Object%5D&originHeight=633&originWidth=1129&size=0&status=done&style=none&width=1129" alt><br><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445779-c15f3dba-9e4f-4a43-aa6a-da30699cb8ae.png#align=left&display=inline&height=554&margin=%5Bobject%20Object%5D&originHeight=554&originWidth=1243&size=0&status=done&style=none&width=1243" alt><br><a name="04dcb13e"></a></p><h4 id="9-2-1-商品维度表-全量"><a href="#9-2-1-商品维度表-全量" class="headerlink" title="9.2.1 商品维度表(全量)"></a>9.2.1 商品维度表(全量)</h4><ul><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwd.mall__dim_sku_info</span><br><span class="line">    </span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwd.mall__dim_sku_info&#96;(</span><br><span class="line">&#96;id&#96; string COMMENT &#39;商品 id&#39;,</span><br><span class="line">&#96;spu_id&#96; string COMMENT &#39;spuid&#39;,</span><br><span class="line">&#96;price&#96; double COMMENT &#39;商品价格&#39;,</span><br><span class="line">&#96;sku_name&#96; string COMMENT &#39;商品名称&#39;,</span><br><span class="line">&#96;sku_desc&#96; string COMMENT &#39;商品描述&#39;,</span><br><span class="line">&#96;weight&#96; double COMMENT &#39;重量&#39;,</span><br><span class="line">&#96;tm_id&#96; string COMMENT &#39;品牌 id&#39;,</span><br><span class="line">&#96;tm_name&#96; string COMMENT &#39;品牌名称&#39;,</span><br><span class="line">&#96;category3_id&#96; string COMMENT &#39;三级分类 id&#39;,</span><br><span class="line">&#96;category2_id&#96; string COMMENT &#39;二级分类 id&#39;,</span><br><span class="line">&#96;category1_id&#96; string COMMENT &#39;一级分类 id&#39;,</span><br><span class="line">&#96;category3_name&#96; string COMMENT &#39;三级分类名称&#39;,</span><br><span class="line">&#96;category2_name&#96; string COMMENT &#39;二级分类名称&#39;,</span><br><span class="line">&#96;category1_name&#96; string COMMENT &#39;一级分类名称&#39;,</span><br><span class="line">&#96;spu_name&#96; string COMMENT &#39;spu 名称&#39;,</span><br><span class="line">&#96;create_time&#96; string COMMENT &#39;创建时间&#39;</span><br><span class="line">  ) COMMENT &#39;商品维度表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwd&#x2F;mall&#x2F;dim_sku_info&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>数据导入</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwd</span><br><span class="line">table_name&#x3D;dim_sku_info</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">PARTITION (dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">sku.id,</span><br><span class="line">sku.spu_id,</span><br><span class="line">sku.price,</span><br><span class="line">sku.sku_name,</span><br><span class="line">sku.sku_desc,</span><br><span class="line">sku.weight,</span><br><span class="line">sku.tm_id,</span><br><span class="line">ob.tm_name,</span><br><span class="line">sku.category3_id,</span><br><span class="line">c2.id category2_id,</span><br><span class="line">c1.id category1_id,</span><br><span class="line">c3.name category3_name,</span><br><span class="line">c2.name category2_name,</span><br><span class="line">c1.name category1_name,</span><br><span class="line">spu.spu_name,</span><br><span class="line">from_unixtime(cast(sku.create_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;) create_time</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select * from ods.mall__sku_info where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">)sku</span><br><span class="line">join</span><br><span class="line">(</span><br><span class="line">select * from ods.mall__base_trademark where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">)ob on sku.tm_id&#x3D;ob.tm_id</span><br><span class="line">join</span><br><span class="line">(</span><br><span class="line">select * from ods.mall__spu_info where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">)spu on spu.id &#x3D; sku.spu_id</span><br><span class="line">join</span><br><span class="line">(</span><br><span class="line">select * from ods.mall__base_category3 where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">)c3 on sku.category3_id&#x3D;c3.id</span><br><span class="line">join</span><br><span class="line">(</span><br><span class="line">select * from ods.mall__base_category2 where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">)c2 on c3.category2_id&#x3D;c2.id</span><br><span class="line">join</span><br><span class="line">(</span><br><span class="line">select * from ods.mall__base_category1 where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">)c1 on c2.category1_id&#x3D;c1.id;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="f6fa4154"></a></p><h4 id="9-2-2-优惠券信息维度表-全量"><a href="#9-2-2-优惠券信息维度表-全量" class="headerlink" title="9.2.2 优惠券信息维度表(全量)"></a>9.2.2 优惠券信息维度表(全量)</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwd.mall__dim_coupon_info</span><br><span class="line">    </span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwd.mall__dim_coupon_info&#96;(</span><br><span class="line">&#96;id&#96; string COMMENT &#39;购物券编号&#39;,</span><br><span class="line">&#96;coupon_name&#96; string COMMENT &#39;购物券名称&#39;,</span><br><span class="line">&#96;coupon_type&#96; string COMMENT &#39;购物券类型 1 现金券 2 折扣券 3 满减券 4 满件打折券&#39;,</span><br><span class="line">&#96;condition_amount&#96; string COMMENT &#39;满额数&#39;,</span><br><span class="line">&#96;condition_num&#96; string COMMENT &#39;满件数&#39;,</span><br><span class="line">&#96;activity_id&#96; string COMMENT &#39;活动编号&#39;,</span><br><span class="line">&#96;benefit_amount&#96; string COMMENT &#39;减金额&#39;,</span><br><span class="line">&#96;benefit_discount&#96; string COMMENT &#39;折扣&#39;,</span><br><span class="line">&#96;create_time&#96; string COMMENT &#39;创建时间&#39;,</span><br><span class="line">&#96;range_type&#96; string COMMENT &#39;范围类型 1、商品 2、品类 3、品牌&#39;,</span><br><span class="line">&#96;spu_id&#96; string COMMENT &#39;商品 id&#39;,</span><br><span class="line">&#96;tm_id&#96; string COMMENT &#39;品牌 id&#39;,</span><br><span class="line">&#96;category3_id&#96; string COMMENT &#39;品类 id&#39;,</span><br><span class="line">&#96;limit_num&#96; string COMMENT &#39;最多领用次数&#39;,</span><br><span class="line">&#96;operate_time&#96; string COMMENT &#39;修改时间&#39;,</span><br><span class="line">&#96;expire_time&#96; string COMMENT &#39;过期时间&#39;</span><br><span class="line">  ) COMMENT &#39;优惠券信息维度表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwd&#x2F;mall&#x2F;dim_coupon_info&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>数据导入</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwd</span><br><span class="line">table_name&#x3D;dim_coupon_info</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">PARTITION (dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">id,</span><br><span class="line">coupon_name,</span><br><span class="line">coupon_type,</span><br><span class="line">condition_amount,</span><br><span class="line">condition_num,</span><br><span class="line">activity_id,</span><br><span class="line">benefit_amount,</span><br><span class="line">benefit_discount,</span><br><span class="line">from_unixtime(cast(create_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;) create_time,</span><br><span class="line">range_type,</span><br><span class="line">spu_id,</span><br><span class="line">tm_id,</span><br><span class="line">category3_id,</span><br><span class="line">limit_num,</span><br><span class="line">from_unixtime(cast(operate_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;) operate_time,</span><br><span class="line">from_unixtime(cast(expire_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;) expire_time</span><br><span class="line">from ods.mall__coupon_info</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="e5a0afd7"></a></p><h4 id="9-2-3-活动维度表-全量"><a href="#9-2-3-活动维度表-全量" class="headerlink" title="9.2.3 活动维度表(全量)"></a>9.2.3 活动维度表(全量)</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwd.mall__dim_activity_info</span><br><span class="line">    </span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwd.mall__dim_activity_info&#96;(</span><br><span class="line">&#96;id&#96; string COMMENT &#39;编号&#39;,</span><br><span class="line">&#96;activity_name&#96; string COMMENT &#39;活动名称&#39;,</span><br><span class="line">&#96;activity_type&#96; string COMMENT &#39;活动类型&#39;,</span><br><span class="line">&#96;condition_amount&#96; string COMMENT &#39;满减金额&#39;,</span><br><span class="line">&#96;condition_num&#96; string COMMENT &#39;满减件数&#39;,</span><br><span class="line">&#96;benefit_amount&#96; string COMMENT &#39;优惠金额&#39;,</span><br><span class="line">&#96;benefit_discount&#96; string COMMENT &#39;优惠折扣&#39;,</span><br><span class="line">&#96;benefit_level&#96; string COMMENT &#39;优惠级别&#39;,</span><br><span class="line">&#96;start_time&#96; string COMMENT &#39;开始时间&#39;,</span><br><span class="line">&#96;end_time&#96; string COMMENT &#39;结束时间&#39;,</span><br><span class="line">&#96;create_time&#96; string COMMENT &#39;创建时间&#39;</span><br><span class="line">  ) COMMENT &#39;活动维度表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwd&#x2F;mall&#x2F;dim_activity_info&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>数据导入</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwd</span><br><span class="line">table_name&#x3D;dim_activity_info</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">PARTITION (dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">info.id,</span><br><span class="line">info.activity_name,</span><br><span class="line">info.activity_type,</span><br><span class="line">rule.condition_amount,</span><br><span class="line">rule.condition_num,</span><br><span class="line">rule.benefit_amount,</span><br><span class="line">rule.benefit_discount,</span><br><span class="line">rule.benefit_level,</span><br><span class="line">from_unixtime(cast(info.start_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;) start_time,</span><br><span class="line">from_unixtime(cast(info.end_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;) end_time,</span><br><span class="line">from_unixtime(cast(info.create_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;) create_time</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select * from ods.mall__activity_info where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">)info</span><br><span class="line">left join</span><br><span class="line">(</span><br><span class="line">select * from ods.mall__activity_rule where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">)rule on info.id &#x3D; rule.activity_id;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="40a2f40c"></a></p><h4 id="9-2-4-地区维度表-特殊"><a href="#9-2-4-地区维度表-特殊" class="headerlink" title="9.2.4 地区维度表(特殊)"></a>9.2.4 地区维度表(特殊)</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwd.mall__dim_base_province</span><br><span class="line">    </span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwd.mall__dim_base_province&#96;(</span><br><span class="line">&#96;id&#96; string COMMENT &#39;id&#39;,</span><br><span class="line">&#96;province_name&#96; string COMMENT &#39;省市名称&#39;,</span><br><span class="line">&#96;area_code&#96; string COMMENT &#39;地区编码&#39;,</span><br><span class="line">&#96;iso_code&#96; string COMMENT &#39;ISO 编码&#39;,</span><br><span class="line">&#96;region_id&#96; string COMMENT &#39;地区 id&#39;,</span><br><span class="line">&#96;region_name&#96; string COMMENT &#39;地区名称&#39;</span><br><span class="line">  ) COMMENT &#39;地区维度表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwd&#x2F;mall&#x2F;dim_base_province&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>数据导入</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwd</span><br><span class="line">table_name&#x3D;dim_base_province</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">PARTITION (dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">bp.id,</span><br><span class="line">bp.name,</span><br><span class="line">bp.area_code,</span><br><span class="line">bp.iso_code,</span><br><span class="line">bp.region_id,</span><br><span class="line">br.region_name</span><br><span class="line">from ods.mall__base_province bp</span><br><span class="line">join ods.mall__base_region br</span><br><span class="line">on bp.region_id&#x3D;br.id;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="6dc38b01"></a></p><h4 id="9-2-5-时间维度表-特殊"><a href="#9-2-5-时间维度表-特殊" class="headerlink" title="9.2.5 时间维度表(特殊)"></a>9.2.5 时间维度表(特殊)</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwd.mall__dim_date_info</span><br><span class="line">    </span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwd.mall__dim_date_info&#96;(</span><br><span class="line">&#96;date_id&#96; string COMMENT &#39;日&#39;,</span><br><span class="line">&#96;week_id&#96; int COMMENT &#39;周&#39;,</span><br><span class="line">&#96;week_day&#96; int COMMENT &#39;周的第几天&#39;,</span><br><span class="line">&#96;day&#96; int COMMENT &#39;每月的第几天&#39;,</span><br><span class="line">&#96;month&#96; int COMMENT &#39;第几月&#39;,</span><br><span class="line">&#96;quarter&#96; int COMMENT &#39;第几季度&#39;,</span><br><span class="line">&#96;year&#96; int COMMENT &#39;年&#39;,</span><br><span class="line">&#96;is_workday&#96; int COMMENT &#39;是否是周末&#39;,</span><br><span class="line">&#96;holiday_id&#96; int COMMENT &#39;是否是节假日&#39;</span><br><span class="line">  ) COMMENT &#39;时间维度表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwd&#x2F;mall&#x2F;dim_date_info&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>数据导入</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwd</span><br><span class="line">table_name&#x3D;dim_date_info</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">PARTITION (dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">date_id,</span><br><span class="line">week_id,</span><br><span class="line">week_day,</span><br><span class="line">day,</span><br><span class="line">month,</span><br><span class="line">quarter,</span><br><span class="line">year,</span><br><span class="line">is_workday,</span><br><span class="line">holiday_id</span><br><span class="line">from ods.mall__date_info</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="2d4b2961"></a></p><h4 id="9-2-6-用户维度表-新增及变化-缓慢变化维-拉链表"><a href="#9-2-6-用户维度表-新增及变化-缓慢变化维-拉链表" class="headerlink" title="9.2.6 用户维度表(新增及变化-缓慢变化维-拉链表)"></a>9.2.6 用户维度表(新增及变化-缓慢变化维-拉链表)</h4><p><a name="467596bc"></a></p><h5 id="9-2-6-1-拉链表介绍"><a href="#9-2-6-1-拉链表介绍" class="headerlink" title="9.2.6.1 拉链表介绍"></a>9.2.6.1 拉链表介绍</h5><blockquote><p>拉链表，记录每条信息的生命周期，一旦一条记录的生命周期结束，就重新开始一条新的记录，并把当前日期放入生效开始日期。<br>如果当前信息至今有效，在生效结束日期中填入一个极大值（如:9999-99-99）,下表为张三的手机号变化例子</p></blockquote></li></ul><table><thead><tr><th>用户ID</th><th>姓名</th><th>手机号</th><th>开始日期</th><th>结束日期</th></tr></thead><tbody><tr><td>1</td><td>张三</td><td>134XXXX5050</td><td>2019-01-01</td><td>2019-01-02</td></tr><tr><td>1</td><td>张三</td><td>139XXXX3232</td><td>2019-01-03</td><td>2020-01-01</td></tr><tr><td>1</td><td>张三</td><td>137XXXX7676</td><td>2020-01-02</td><td>9999-99-99</td></tr></tbody></table><ul><li><p>适合场景：数据会发生变化，但是大部分不变（即：缓慢变化维）</p><blockquote><p>比如：用户信息发生变化，但是每天变化比例不高，按照每日全量，则效率低</p></blockquote></li><li><p>如何使用拉链表：通过–&gt;生效开始日期&lt;=某个日期 且 生效结束日期&gt;=某个日期，能够得到某个时间点的数据全量切片。</p></li></ul><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445711-5342c8ec-2d69-476f-bb7d-80084793cdc7.png#align=left&display=inline&height=699&margin=%5Bobject%20Object%5D&originHeight=699&originWidth=1309&size=0&status=done&style=none&width=1309" alt></p><ul><li>拉链表形成过程</li></ul><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445446108-d1824bc2-adf7-4c28-bc32-71e8ba274913.png#align=left&display=inline&height=806&margin=%5Bobject%20Object%5D&originHeight=806&originWidth=1435&size=0&status=done&style=none&width=1435" alt></p><ul><li>制作流程<blockquote><p>用户当日全部数据和MySQL中每天变化的数据拼接在一起，形成一个&lt;新的临时拉链表。<br>用临时拉链表覆盖旧的拉链表数据。<br>从而解决Hive中数据不能更新的问题</p></blockquote></li></ul><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445740-8f05faaa-a871-4d74-9b27-f4fa1da7e452.png#align=left&display=inline&height=771&margin=%5Bobject%20Object%5D&originHeight=771&originWidth=1437&size=0&status=done&style=none&width=1437" alt><br><a name="ded0213a"></a></p><h5 id="9-2-6-2-用户维度表"><a href="#9-2-6-2-用户维度表" class="headerlink" title="9.2.6.2 用户维度表"></a>9.2.6.2 用户维度表</h5><blockquote><p>用户表中的数据每日既有可能新增，也有可能修改，属于缓慢变化维度，此处采用拉链表存储用户维度数据。</p></blockquote><ul><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwd.mall__dim_user_info_his</span><br><span class="line">    </span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwd.mall__dim_user_info_his&#96;(</span><br><span class="line">&#96;id&#96; string COMMENT &#39;用户 id&#39;,</span><br><span class="line">&#96;name&#96; string COMMENT &#39;姓名&#39;,</span><br><span class="line">&#96;birthday&#96; string COMMENT &#39;生日&#39;,</span><br><span class="line">&#96;gender&#96; string COMMENT &#39;性别&#39;,</span><br><span class="line">&#96;email&#96; string COMMENT &#39;邮箱&#39;,</span><br><span class="line">&#96;user_level&#96; string COMMENT &#39;用户等级&#39;,</span><br><span class="line">&#96;create_time&#96; string COMMENT &#39;创建时间&#39;,</span><br><span class="line">&#96;operate_time&#96; string COMMENT &#39;操作时间&#39;,</span><br><span class="line">&#96;start_date&#96; string COMMENT &#39;有效开始日期&#39;,</span><br><span class="line">&#96;end_date&#96; string COMMENT &#39;有效结束日期&#39;</span><br><span class="line">  ) COMMENT &#39;用户拉链表&#39;</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwd&#x2F;mall&#x2F;dim_user_info_his&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>临时表建表(结构与主表相同)</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwd.mall__dim_user_info_his_tmp</span><br><span class="line">    </span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwd.mall__dim_user_info_his_tmp&#96;(</span><br><span class="line">&#96;id&#96; string COMMENT &#39;用户 id&#39;,</span><br><span class="line">&#96;name&#96; string COMMENT &#39;姓名&#39;,</span><br><span class="line">&#96;birthday&#96; string COMMENT &#39;生日&#39;,</span><br><span class="line">&#96;gender&#96; string COMMENT &#39;性别&#39;,</span><br><span class="line">&#96;email&#96; string COMMENT &#39;邮箱&#39;,</span><br><span class="line">&#96;user_level&#96; string COMMENT &#39;用户等级&#39;,</span><br><span class="line">&#96;create_time&#96; string COMMENT &#39;创建时间&#39;,</span><br><span class="line">&#96;operate_time&#96; string COMMENT &#39;操作时间&#39;,</span><br><span class="line">&#96;start_date&#96; string COMMENT &#39;有效开始日期&#39;,</span><br><span class="line">&#96;end_date&#96; string COMMENT &#39;有效结束日期&#39;</span><br><span class="line">  ) COMMENT &#39;用户拉链表&#39;</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwd&#x2F;mall&#x2F;dim_user_info_his_tmp&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>首先（主表）数据初始化，只做一次</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwd</span><br><span class="line">table_name&#x3D;dim_user_info_his</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">select</span><br><span class="line">id,</span><br><span class="line">name,</span><br><span class="line">from_unixtime(cast(birthday&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;) birthday,</span><br><span class="line">gender,</span><br><span class="line">email,</span><br><span class="line">user_level,</span><br><span class="line">from_unixtime(cast(create_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;) create_time,</span><br><span class="line">from_unixtime(cast(operate_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;) operate_time,</span><br><span class="line">&#39;$db_date&#39;,</span><br><span class="line">&#39;9999-99-99&#39;</span><br><span class="line">from ods.mall__user_info oi</span><br><span class="line">where oi.dt&#x3D;&#39;$db_date&#39;;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure></li><li><p>临时表数据计算导入(在主表数据之后执行)</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwd</span><br><span class="line">table_name&#x3D;dim_user_info_his_tmp</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">select</span><br><span class="line">* </span><br><span class="line">from</span><br><span class="line">(      --查询当前时间的所有信息</span><br><span class="line">select</span><br><span class="line">cast(id as string) id,</span><br><span class="line">name,</span><br><span class="line">from_unixtime(cast(birthday&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;) birthday,</span><br><span class="line">gender,</span><br><span class="line">email,</span><br><span class="line">user_level,</span><br><span class="line">from_unixtime(cast(create_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;) create_time,</span><br><span class="line">from_unixtime(cast(operate_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;) operate_time,</span><br><span class="line">&#39;$db_date&#39; start_date,</span><br><span class="line">&#39;9999-99-99&#39; end_date</span><br><span class="line">from ods.mall__user_info where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">union all</span><br><span class="line"> --查询当前变化了的数据，修改日期</span><br><span class="line">select</span><br><span class="line">uh.id,</span><br><span class="line">uh.name,</span><br><span class="line">from_unixtime(cast(uh.birthday&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;) birthday,</span><br><span class="line">uh.gender,</span><br><span class="line">uh.email,</span><br><span class="line">uh.user_level,</span><br><span class="line">from_unixtime(cast(uh.create_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;) create_time,</span><br><span class="line">from_unixtime(cast(uh.operate_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;) operate_time,</span><br><span class="line">uh.start_date,</span><br><span class="line">if(ui.id is not null and uh.end_date&#x3D;&#39;9999-99-99&#39;, date_add(ui.dt,-1),uh.end_date) end_date</span><br><span class="line">from dwd.mall__dim_user_info_his uh left join</span><br><span class="line">(</span><br><span class="line">        --查询当前时间的所有信息</span><br><span class="line">select</span><br><span class="line">cast(id as string) id,</span><br><span class="line">name,</span><br><span class="line">from_unixtime(cast(birthday&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;) birthday,</span><br><span class="line">gender,</span><br><span class="line">email,</span><br><span class="line">user_level,</span><br><span class="line">from_unixtime(cast(create_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;) create_time,</span><br><span class="line">from_unixtime(cast(operate_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;) operate_time,</span><br><span class="line">dt</span><br><span class="line">from ods.mall__user_info</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">) ui on uh.id&#x3D;ui.id</span><br><span class="line">)his</span><br><span class="line">order by his.id, start_date;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure></li><li><p>数据导入</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwd</span><br><span class="line">table_name&#x3D;dim_user_info_his</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">select * from dwd.mall__dim_user_info_his_tmp;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="7bac56d4"></a></p><h4 id="9-2-7-订单详情事实表-事务型快照事实表-新增"><a href="#9-2-7-订单详情事实表-事务型快照事实表-新增" class="headerlink" title="9.2.7 订单详情事实表(事务型快照事实表-新增)"></a>9.2.7 订单详情事实表(事务型快照事实表-新增)</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwd.mall__fact_order_detail</span><br><span class="line">    </span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwd.mall__fact_order_detail&#96;(</span><br><span class="line">  &#96;id&#96; bigint COMMENT &#39;编号&#39;,</span><br><span class="line">  &#96;order_id&#96; bigint COMMENT &#39;订单编号&#39;,</span><br><span class="line">  &#96;user_id&#96; bigint COMMENT &#39;用户id&#39;,</span><br><span class="line">  &#96;sku_id&#96; bigint COMMENT &#39;sku_id&#39;,</span><br><span class="line">  &#96;sku_name&#96; string COMMENT &#39;sku名称&#39;,</span><br><span class="line">  &#96;order_price&#96; decimal(10,2) COMMENT &#39;购买价格(下单时sku价格）&#39;,</span><br><span class="line">  &#96;sku_num&#96; string COMMENT &#39;购买个数&#39;,</span><br><span class="line">  &#96;create_time&#96; bigint COMMENT &#39;创建时间&#39;,</span><br><span class="line">  &#96;province_id&#96; string COMMENT &#39;省份ID&#39;,</span><br><span class="line">  &#96;total_amount&#96; decimal(20,2) COMMENT &#39;订单总金额&#39;</span><br><span class="line">  ) COMMENT &#39;订单明细表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwd&#x2F;mall&#x2F;fact_order_detail&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>数据导入</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwd</span><br><span class="line">table_name&#x3D;fact_order_detail</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">PARTITION (dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">od.id, </span><br><span class="line">od.order_id, </span><br><span class="line">od.user_id, </span><br><span class="line">od.sku_id, </span><br><span class="line">od.sku_name, </span><br><span class="line">od.order_price, </span><br><span class="line">od.sku_num, </span><br><span class="line">od.create_time, </span><br><span class="line">oi.province_id, </span><br><span class="line">od.order_price*od.sku_num </span><br><span class="line">from (select * from ods.mall__order_detail where dt&#x3D;&#39;$db_date&#39; ) od </span><br><span class="line">join (select * from ods.mall__order_info where dt&#x3D;&#39;$db_date&#39; ) oi </span><br><span class="line">on od.order_id&#x3D;oi.id;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="038064ea"></a></p><h4 id="9-2-7-支付事实表-事务型快照事实表-新增"><a href="#9-2-7-支付事实表-事务型快照事实表-新增" class="headerlink" title="9.2.7 支付事实表(事务型快照事实表-新增)"></a>9.2.7 支付事实表(事务型快照事实表-新增)</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwd.mall__fact_payment_info</span><br><span class="line">    </span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwd.mall__fact_payment_info&#96;(</span><br><span class="line">&#96;id&#96; string COMMENT &#39;&#39;,</span><br><span class="line">&#96;out_trade_no&#96; string COMMENT &#39;对外业务编号&#39;,</span><br><span class="line">&#96;order_id&#96; string COMMENT &#39;订单编号&#39;,</span><br><span class="line">&#96;user_id&#96; string COMMENT &#39;用户编号&#39;,</span><br><span class="line">&#96;alipay_trade_no&#96; string COMMENT &#39;支付宝交易流水编号&#39;,</span><br><span class="line">&#96;payment_amount&#96; decimal(16,2) COMMENT &#39;支付金额&#39;,</span><br><span class="line">&#96;subject&#96; string COMMENT &#39;交易内容&#39;,</span><br><span class="line">&#96;payment_type&#96; string COMMENT &#39;支付类型&#39;,</span><br><span class="line">&#96;payment_time&#96; string COMMENT &#39;支付时间&#39;,</span><br><span class="line">&#96;province_id&#96; string COMMENT &#39;省份 ID&#39;</span><br><span class="line">  ) COMMENT &#39;支付事实表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwd&#x2F;mall&#x2F;fact_payment_info&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>数据导入</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwd</span><br><span class="line">table_name&#x3D;fact_payment_info</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">PARTITION (dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">pi.id,</span><br><span class="line">pi.out_trade_no,</span><br><span class="line">pi.order_id,</span><br><span class="line">pi.user_id,</span><br><span class="line">pi.alipay_trade_no,</span><br><span class="line">pi.total_amount,</span><br><span class="line">pi.subject,</span><br><span class="line">pi.payment_type,</span><br><span class="line">from_unixtime(cast(pi.payment_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;) payment_time,</span><br><span class="line">oi.province_id</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select * from ods.mall__payment_info where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">)pi</span><br><span class="line">join</span><br><span class="line">(</span><br><span class="line">select id, province_id from ods.mall__order_info where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">)oi</span><br><span class="line">on pi.order_id &#x3D; oi.id;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="ab3a3670"></a></p><h4 id="9-2-8-退款事实表-事务型快照事实表-新增"><a href="#9-2-8-退款事实表-事务型快照事实表-新增" class="headerlink" title="9.2.8 退款事实表(事务型快照事实表-新增)"></a>9.2.8 退款事实表(事务型快照事实表-新增)</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwd.mall__fact_order_refund_info</span><br><span class="line">    </span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwd.mall__fact_order_refund_info&#96;(</span><br><span class="line">&#96;id&#96; string COMMENT &#39;编号&#39;,</span><br><span class="line">&#96;user_id&#96; string COMMENT &#39;用户 ID&#39;,</span><br><span class="line">&#96;order_id&#96; string COMMENT &#39;订单 ID&#39;,</span><br><span class="line">&#96;sku_id&#96; string COMMENT &#39;商品 ID&#39;,</span><br><span class="line">&#96;refund_type&#96; string COMMENT &#39;退款类型&#39;,</span><br><span class="line">&#96;refund_num&#96; bigint COMMENT &#39;退款件数&#39;,</span><br><span class="line">&#96;refund_amount&#96; decimal(16,2) COMMENT &#39;退款金额&#39;,</span><br><span class="line">&#96;refund_reason_type&#96; string COMMENT &#39;退款原因类型&#39;,</span><br><span class="line">&#96;create_time&#96; string COMMENT &#39;退款时间&#39;</span><br><span class="line">  ) COMMENT &#39;退款事实表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwd&#x2F;mall&#x2F;fact_order_refund_info&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>数据导入</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwd</span><br><span class="line">table_name&#x3D;fact_order_refund_info</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">PARTITION (dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">id,</span><br><span class="line">user_id,</span><br><span class="line">order_id,</span><br><span class="line">sku_id,</span><br><span class="line">refund_type,</span><br><span class="line">refund_num,</span><br><span class="line">refund_amount,</span><br><span class="line">refund_reason_type,</span><br><span class="line">from_unixtime(cast(create_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;) create_time</span><br><span class="line">from ods.mall__order_refund_info</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="6fc9b5b6"></a></p><h4 id="9-2-9-评价事实表-事务型快照事实表-新增"><a href="#9-2-9-评价事实表-事务型快照事实表-新增" class="headerlink" title="9.2.9 评价事实表(事务型快照事实表-新增)"></a>9.2.9 评价事实表(事务型快照事实表-新增)</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwd.mall__fact_comment_info</span><br><span class="line">    </span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwd.mall__fact_comment_info&#96;(</span><br><span class="line">&#96;id&#96; string COMMENT &#39;编号&#39;,</span><br><span class="line">&#96;user_id&#96; string COMMENT &#39;用户 ID&#39;,</span><br><span class="line">&#96;sku_id&#96; string COMMENT &#39;商品 sku&#39;,</span><br><span class="line">&#96;spu_id&#96; string COMMENT &#39;商品 spu&#39;,</span><br><span class="line">&#96;order_id&#96; string COMMENT &#39;订单 ID&#39;,</span><br><span class="line">&#96;appraise&#96; string COMMENT &#39;评价&#39;,</span><br><span class="line">&#96;create_time&#96; string COMMENT &#39;评价时间&#39;</span><br><span class="line">  ) COMMENT &#39;评价事实表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwd&#x2F;mall&#x2F;fact_comment_info&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>数据导入</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwd</span><br><span class="line">table_name&#x3D;fact_comment_info</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">PARTITION (dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">id,</span><br><span class="line">user_id,</span><br><span class="line">sku_id,</span><br><span class="line">spu_id,</span><br><span class="line">order_id,</span><br><span class="line">appraise,</span><br><span class="line">from_unixtime(cast(create_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;) create_time</span><br><span class="line">from ods.mall__comment_info</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="823e5d60"></a></p><h4 id="9-2-10-加购事实表-周期型快照事实表-全量"><a href="#9-2-10-加购事实表-周期型快照事实表-全量" class="headerlink" title="9.2.10 加购事实表(周期型快照事实表-全量)"></a>9.2.10 加购事实表(周期型快照事实表-全量)</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwd.mall__fact_cart_info</span><br><span class="line">    </span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwd.mall__fact_cart_info&#96;(</span><br><span class="line">&#96;id&#96; string COMMENT &#39;编号&#39;,</span><br><span class="line">&#96;user_id&#96; string COMMENT &#39;用户 id&#39;,</span><br><span class="line">&#96;sku_id&#96; string COMMENT &#39;skuid&#39;,</span><br><span class="line">&#96;cart_price&#96; string COMMENT &#39;放入购物车时价格&#39;,</span><br><span class="line">&#96;sku_num&#96; string COMMENT &#39;数量&#39;,</span><br><span class="line">&#96;sku_name&#96; string COMMENT &#39;sku 名称 (冗余)&#39;,</span><br><span class="line">&#96;create_time&#96; string COMMENT &#39;创建时间&#39;,</span><br><span class="line">&#96;operate_time&#96; string COMMENT &#39;修改时间&#39;,</span><br><span class="line">&#96;is_ordered&#96; string COMMENT &#39;是否已经下单。1 为已下单;0 为未下单&#39;,</span><br><span class="line">&#96;order_time&#96; string COMMENT &#39;下单时间&#39;</span><br><span class="line">  ) COMMENT &#39;加购事实表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwd&#x2F;mall&#x2F;fact_cart_info&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>数据导入</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwd</span><br><span class="line">table_name&#x3D;fact_cart_info</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">PARTITION (dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">id,</span><br><span class="line">user_id,</span><br><span class="line">sku_id,</span><br><span class="line">cart_price,</span><br><span class="line">sku_num,</span><br><span class="line">sku_name,</span><br><span class="line">from_unixtime(cast(create_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;) create_time,</span><br><span class="line">from_unixtime(cast(operate_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;) operate_time,</span><br><span class="line">is_ordered,</span><br><span class="line">from_unixtime(cast(order_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;) order_time</span><br><span class="line">from ods.mall__cart_info</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="2797a2b5"></a></p><h4 id="9-2-11-收藏事实表-周期型快照事实表-全量"><a href="#9-2-11-收藏事实表-周期型快照事实表-全量" class="headerlink" title="9.2.11 收藏事实表(周期型快照事实表-全量)"></a>9.2.11 收藏事实表(周期型快照事实表-全量)</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwd.mall__fact_favor_info</span><br><span class="line">    </span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwd.mall__fact_favor_info&#96;(</span><br><span class="line">&#96;id&#96; string COMMENT &#39;编号&#39;,</span><br><span class="line">&#96;user_id&#96; string COMMENT &#39;用户 id&#39;,</span><br><span class="line">&#96;sku_id&#96; string COMMENT &#39;skuid&#39;,</span><br><span class="line">&#96;spu_id&#96; string COMMENT &#39;spuid&#39;,</span><br><span class="line">&#96;is_cancel&#96; string COMMENT &#39;是否取消&#39;,</span><br><span class="line">&#96;create_time&#96; string COMMENT &#39;收藏时间&#39;,</span><br><span class="line">&#96;cancel_time&#96; string COMMENT &#39;取消时间&#39;</span><br><span class="line">  ) COMMENT &#39;收藏事实表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwd&#x2F;mall&#x2F;fact_favor_info&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>数据导入</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwd</span><br><span class="line">table_name&#x3D;fact_favor_info</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">PARTITION (dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">id,</span><br><span class="line">user_id,</span><br><span class="line">sku_id,</span><br><span class="line">spu_id,</span><br><span class="line">is_cancel,</span><br><span class="line">from_unixtime(cast(create_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;) create_time,</span><br><span class="line">from_unixtime(cast(cancel_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;) cancel_time</span><br><span class="line">from ods.mall__favor_info</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="6f2874af"></a></p><h4 id="9-2-12-优惠券领用事实表-累积型快照事实表-新增及变化"><a href="#9-2-12-优惠券领用事实表-累积型快照事实表-新增及变化" class="headerlink" title="9.2.12 优惠券领用事实表(累积型快照事实表-新增及变化)"></a>9.2.12 优惠券领用事实表(累积型快照事实表-新增及变化)</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwd.mall__fact_coupon_use</span><br><span class="line">    </span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwd.mall__fact_coupon_use&#96;(</span><br><span class="line">&#96;&#96; string COMMENT &#39;编号&#39;,</span><br><span class="line">&#96;coupon_id&#96; string COMMENT &#39;优惠券 ID&#39;,</span><br><span class="line">&#96;user_id&#96; string COMMENT &#39;userid&#39;,</span><br><span class="line">&#96;order_id&#96; string COMMENT &#39;订单 id&#39;,</span><br><span class="line">&#96;coupon_status&#96; string COMMENT &#39;优惠券状态&#39;,</span><br><span class="line">&#96;get_time&#96; string COMMENT &#39;领取时间&#39;,</span><br><span class="line">&#96;using_time&#96; string COMMENT &#39;使用时间(下单)&#39;,</span><br><span class="line">&#96;used_time&#96; string COMMENT &#39;使用时间(支付)&#39;</span><br><span class="line">  ) COMMENT &#39;优惠券领用事实表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwd&#x2F;mall&#x2F;fact_coupon_use&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure><blockquote><p>dt 是按照优惠卷领用时间 get_time 做为分区。<br>get_time 为领用时间，领用过后数据就需要存在，然后在下单和支付的时候叠加更新时间</p></blockquote></li></ul><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445877-2b94e8bb-aec2-445a-bea1-1e80c6f64ace.png#align=left&display=inline&height=836&margin=%5Bobject%20Object%5D&originHeight=836&originWidth=1647&size=0&status=done&style=none&width=1647" alt></p><ul><li><p>数据导入</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwd</span><br><span class="line">table_name&#x3D;fact_coupon_use</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">set hive.exec.dynamic.partition.mode&#x3D;nonstrict;</span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">PARTITION (dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">if(new.id is null,old.id,new.id) id,</span><br><span class="line">if(new.coupon_id is null,old.coupon_id,new.coupon_id) coupon_id,</span><br><span class="line">if(new.user_id is null,old.user_id,new.user_id) user_id,</span><br><span class="line">if(new.order_id is null,old.order_id,new.order_id) order_id,</span><br><span class="line">if(new.coupon_status is null,old.coupon_status,new.coupon_status) coupon_status,</span><br><span class="line">from_unixtime(cast(if(new.get_time is null,old.get_time,new.get_time)&#x2F;1000 as bigint),&#39;yyyy-MM-dd&#39;) get_time,</span><br><span class="line">from_unixtime(cast(if(new.using_time is null,old.using_time,new.using_time)&#x2F;1000 as bigint),&#39;yyyy-MM-dd&#39;) using_time,</span><br><span class="line">from_unixtime(cast(if(new.used_time is null,old.used_time,new.used_time)&#x2F;1000 as bigint),&#39;yyyy-MM-dd&#39;),</span><br><span class="line">from_unixtime(cast(if(new.get_time is null,old.get_time,new.get_time)&#x2F;1000 as bigint),&#39;yyyy-MM-dd&#39;) </span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">id,</span><br><span class="line">coupon_id,</span><br><span class="line">user_id,</span><br><span class="line">order_id,</span><br><span class="line">coupon_status,</span><br><span class="line">get_time,</span><br><span class="line">using_time,</span><br><span class="line">used_time</span><br><span class="line">from dwd.mall__fact_coupon_use</span><br><span class="line">where dt in</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">from_unixtime(cast(get_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd&#39;)</span><br><span class="line">from ods.mall__coupon_use</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">)</span><br><span class="line">)old</span><br><span class="line">full outer join</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">id,</span><br><span class="line">coupon_id,</span><br><span class="line">user_id,</span><br><span class="line">order_id,</span><br><span class="line">coupon_status,</span><br><span class="line">get_time,</span><br><span class="line">using_time,</span><br><span class="line">used_time</span><br><span class="line">from ods.mall__coupon_use</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">)new</span><br><span class="line">on old.id&#x3D;new.id;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="3f232bd5"></a></p><h4 id="9-2-13-订单事实表-累积型快照事实表-新增及变化"><a href="#9-2-13-订单事实表-累积型快照事实表-新增及变化" class="headerlink" title="9.2.13 订单事实表(累积型快照事实表-新增及变化)"></a>9.2.13 订单事实表(累积型快照事实表-新增及变化)</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwd.mall__fact_order_info</span><br><span class="line">    </span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwd.mall__fact_order_info&#96;(</span><br><span class="line">&#96;id&#96; string COMMENT &#39;订单编号&#39;,</span><br><span class="line">&#96;order_status&#96; string COMMENT &#39;订单状态&#39;,</span><br><span class="line">&#96;user_id&#96; string COMMENT &#39;用户 id&#39;,</span><br><span class="line">&#96;out_trade_no&#96; string COMMENT &#39;支付流水号&#39;,</span><br><span class="line">&#96;create_time&#96; string COMMENT &#39;创建时间(未支付状态)&#39;,</span><br><span class="line">&#96;payment_time&#96; string COMMENT &#39;支付时间(已支付状态)&#39;,</span><br><span class="line">&#96;cancel_time&#96; string COMMENT &#39;取消时间(已取消状态)&#39;,</span><br><span class="line">&#96;finish_time&#96; string COMMENT &#39;完成时间(已完成状态)&#39;,</span><br><span class="line">&#96;refund_time&#96; string COMMENT &#39;退款时间(退款中状态)&#39;,</span><br><span class="line">&#96;refund_finish_time&#96; string COMMENT &#39;退款完成时间(退款完成状态)&#39;,</span><br><span class="line">&#96;province_id&#96; string COMMENT &#39;省份 ID&#39;,</span><br><span class="line">&#96;activity_id&#96; string COMMENT &#39;活动 ID&#39;,</span><br><span class="line">&#96;original_total_amount&#96; string COMMENT &#39;原价金额&#39;,</span><br><span class="line">&#96;benefit_reduce_amount&#96; string COMMENT &#39;优惠金额&#39;,</span><br><span class="line">&#96;feight_fee&#96; string COMMENT &#39;运费&#39;,</span><br><span class="line">&#96;final_total_amount&#96; decimal(10,2) COMMENT &#39;订单金额&#39;</span><br><span class="line">  ) COMMENT &#39;订单事实表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwd&#x2F;mall&#x2F;fact_order_info&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445836-498738cf-a65f-4127-af18-9a1661eda52d.png#align=left&display=inline&height=522&margin=%5Bobject%20Object%5D&originHeight=522&originWidth=987&size=0&status=done&style=none&width=987" alt></p></li><li><p>数据导入</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwd</span><br><span class="line">table_name&#x3D;fact_order_info</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">PARTITION (dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">if(new.id is null,old.id,new.id),</span><br><span class="line">if(new.order_status is null,old.order_status,new.order_status),</span><br><span class="line">if(new.user_id is null,old.user_id,new.user_id),</span><br><span class="line">if(new.out_trade_no is null,old.out_trade_no,new.out_trade_no),</span><br><span class="line">if(new.tms[&#39;1001&#39;] is null,from_unixtime(cast(old.create_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;),new.tms[&#39;1001&#39;]),--1001 对应未支付状态</span><br><span class="line">if(new.tms[&#39;1002&#39;] is null,from_unixtime(cast(old.payment_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;),new.tms[&#39;1002&#39;]),</span><br><span class="line">if(new.tms[&#39;1003&#39;] is null,from_unixtime(cast(old.cancel_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;),new.tms[&#39;1003&#39;]),</span><br><span class="line">if(new.tms[&#39;1004&#39;] is null,from_unixtime(cast(old.finish_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;),new.tms[&#39;1004&#39;]),</span><br><span class="line">if(new.tms[&#39;1005&#39;] is null,from_unixtime(cast(old.refund_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;),new.tms[&#39;1005&#39;]),</span><br><span class="line">if(new.tms[&#39;1006&#39;] is null,from_unixtime(cast(old.refund_finish_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd HH:mm:ss&#39;),new.tms[&#39;1006&#39;]),</span><br><span class="line">if(new.province_id is null,old.province_id,new.province_id),</span><br><span class="line">if(new.activity_id is null,old.activity_id,new.activity_id),</span><br><span class="line">if(new.original_total_amount is null,old.original_total_amount,new.original_total_amount),</span><br><span class="line">if(new.benefit_reduce_amount is null,old.benefit_reduce_amount,new.benefit_reduce_amount),</span><br><span class="line">if(new.feight_fee is null,old.feight_fee,new.feight_fee),</span><br><span class="line">if(new.final_total_amount is null,old.final_total_amount,new.final_total_amount)</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">id,</span><br><span class="line">order_status,</span><br><span class="line">user_id,</span><br><span class="line">out_trade_no,</span><br><span class="line">create_time,</span><br><span class="line">payment_time,</span><br><span class="line">cancel_time,</span><br><span class="line">finish_time,</span><br><span class="line">refund_time,</span><br><span class="line">refund_finish_time,</span><br><span class="line">province_id,</span><br><span class="line">activity_id,</span><br><span class="line">original_total_amount,</span><br><span class="line">benefit_reduce_amount,</span><br><span class="line">feight_fee,</span><br><span class="line">final_total_amount</span><br><span class="line">from dwd.mall__fact_order_info</span><br><span class="line">where dt in </span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">from_unixtime(cast(create_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd&#39;)</span><br><span class="line">from ods.mall__order_info</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">)</span><br><span class="line">)old</span><br><span class="line">full outer join</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">info.id,</span><br><span class="line">info.order_status,</span><br><span class="line">info.user_id,</span><br><span class="line">info.out_trade_no,</span><br><span class="line">info.province_id,</span><br><span class="line">act.activity_id,</span><br><span class="line">log.tms,</span><br><span class="line">info.original_total_amount,</span><br><span class="line">info.benefit_reduce_amount,</span><br><span class="line">info.feight_fee,</span><br><span class="line">info.final_total_amount</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">order_id,</span><br><span class="line">str_to_map(concat_ws(&#39;,&#39;,collect_set(concat(order_status,&#39;&#x3D;&#39;,from_unixtime(cast(operate_time&#x2F;1000 as bigint),&#39;yyyy-MM-dd&#39;)))),&#39;,&#39;,&#39;&#x3D;&#39;)</span><br><span class="line">tms</span><br><span class="line">from ods.mall__order_status_log</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">group by order_id</span><br><span class="line">)log</span><br><span class="line">join</span><br><span class="line">(</span><br><span class="line">select * from ods.mall__order_info where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">)info</span><br><span class="line">on log.order_id&#x3D;info.id</span><br><span class="line">left join</span><br><span class="line">(</span><br><span class="line">select * from ods.mall__activity_order where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">)act</span><br><span class="line">on log.order_id&#x3D;act.order_id</span><br><span class="line">)new</span><br><span class="line">on old.id&#x3D;new.id;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="blogTitle37"></a></p><h2 id="10-DWS层构建"><a href="#10-DWS层构建" class="headerlink" title="10 DWS层构建"></a>10 DWS层构建</h2><blockquote><p>不在进行压缩处理，因为压缩对于硬盘是好的，但是对于CPU计算是差的，对于DWS层的表，会被经常使用，那么讲究的是计算效率，此层主要处理每日主题行为</p></blockquote></li></ul><p><a name="blogTitle38"></a></p><h3 id="10-1-每日设备行为-用户行为"><a href="#10-1-每日设备行为-用户行为" class="headerlink" title="10.1 每日设备行为(用户行为)"></a>10.1 每日设备行为(用户行为)</h3><ul><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dws.mall__uv_detail_daycount</span><br><span class="line">CREATE EXTERNAL TABLE &#96;dws.mall__uv_detail_daycount&#96;(</span><br><span class="line">&#96;mid_id&#96; string COMMENT &#39;设备唯一标识&#39;,</span><br><span class="line">&#96;user_id&#96; string COMMENT &#39;用户标识&#39;,</span><br><span class="line">&#96;version_code&#96; string COMMENT &#39;程序版本号&#39;,</span><br><span class="line">&#96;version_name&#96; string COMMENT &#39;程序版本名&#39;,</span><br><span class="line">&#96;lang&#96; string COMMENT &#39;系统语言&#39;,</span><br><span class="line">&#96;source&#96; string COMMENT &#39;渠道号&#39;,</span><br><span class="line">&#96;os&#96; string COMMENT &#39;安卓系统版本&#39;,</span><br><span class="line">&#96;area&#96; string COMMENT &#39;区域&#39;,</span><br><span class="line">&#96;model&#96; string COMMENT &#39;手机型号&#39;,</span><br><span class="line">&#96;brand&#96; string COMMENT &#39;手机品牌&#39;,</span><br><span class="line">&#96;sdk_version&#96; string COMMENT &#39;sdkVersion&#39;,</span><br><span class="line">&#96;gmail&#96; string COMMENT &#39;gmail&#39;,</span><br><span class="line">&#96;height_width&#96; string COMMENT &#39;屏幕宽高&#39;,</span><br><span class="line">&#96;app_time&#96; string COMMENT &#39;客户端日志产生时的时间&#39;,</span><br><span class="line">&#96;network&#96; string COMMENT &#39;网络模式&#39;,</span><br><span class="line">&#96;lng&#96; string COMMENT &#39;经度&#39;,</span><br><span class="line">&#96;lat&#96; string COMMENT &#39;纬度&#39;,</span><br><span class="line">&#96;login_count&#96; bigint COMMENT &#39;活跃次数&#39;</span><br><span class="line">  ) COMMENT &#39;每日设备行为表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dws&#x2F;mall&#x2F;uv_detail_daycount&#x2F;&#39;</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dws</span><br><span class="line">table_name&#x3D;uv_detail_daycount</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">PARTITION (dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">mid_id,</span><br><span class="line">concat_ws(&#39;|&#39;, collect_set(user_id)) user_id,</span><br><span class="line">concat_ws(&#39;|&#39;, collect_set(version_code)) version_code,</span><br><span class="line">concat_ws(&#39;|&#39;, collect_set(version_name)) version_name,</span><br><span class="line">concat_ws(&#39;|&#39;, collect_set(lang))lang,</span><br><span class="line">concat_ws(&#39;|&#39;, collect_set(source)) source,</span><br><span class="line">concat_ws(&#39;|&#39;, collect_set(os)) os,</span><br><span class="line">concat_ws(&#39;|&#39;, collect_set(area)) area,</span><br><span class="line">concat_ws(&#39;|&#39;, collect_set(model)) model,</span><br><span class="line">concat_ws(&#39;|&#39;, collect_set(brand)) brand,</span><br><span class="line">concat_ws(&#39;|&#39;, collect_set(sdk_version)) sdk_version,</span><br><span class="line">concat_ws(&#39;|&#39;, collect_set(gmail)) gmail,</span><br><span class="line">concat_ws(&#39;|&#39;, collect_set(height_width)) height_width,</span><br><span class="line">concat_ws(&#39;|&#39;, collect_set(app_time)) app_time,</span><br><span class="line">concat_ws(&#39;|&#39;, collect_set(network)) network,</span><br><span class="line">concat_ws(&#39;|&#39;, collect_set(lng)) lng,</span><br><span class="line">concat_ws(&#39;|&#39;, collect_set(lat)) lat,</span><br><span class="line">count(*) login_count</span><br><span class="line">from dwd.mall__start_log</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">group by mid_id;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="blogTitle39"></a></p><h3 id="10-2-每日会员行为-业务"><a href="#10-2-每日会员行为-业务" class="headerlink" title="10.2 每日会员行为(业务)"></a>10.2 每日会员行为(业务)</h3></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dws.mall__user_action_daycount</span><br><span class="line">CREATE EXTERNAL TABLE &#96;dws.mall__user_action_daycount&#96;(</span><br><span class="line">user_id string comment &#39;用户 id&#39;,</span><br><span class="line">login_count bigint comment &#39;登录次数&#39;,</span><br><span class="line">cart_count bigint comment &#39;加入购物车次数&#39;,</span><br><span class="line">cart_amount double comment &#39;加入购物车金额&#39;,</span><br><span class="line">order_count bigint comment &#39;下单次数&#39;,</span><br><span class="line">order_amount decimal(16,2) comment &#39;下单金额&#39;,</span><br><span class="line">payment_count bigint comment &#39;支付次数&#39;,</span><br><span class="line">payment_amount decimal(16,2) comment &#39;支付金额&#39;</span><br><span class="line">  ) COMMENT &#39;每日会员行为表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dws&#x2F;mall&#x2F;user_action_daycount&#x2F;&#39;</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dws</span><br><span class="line">table_name&#x3D;user_action_daycount</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">with</span><br><span class="line">tmp_login as</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">user_id,</span><br><span class="line">count(*) login_count</span><br><span class="line">from dwd.mall__start_log</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">and user_id is not null</span><br><span class="line">group by user_id</span><br><span class="line">),</span><br><span class="line">tmp_cart as</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">user_id,</span><br><span class="line">count(*) cart_count,</span><br><span class="line">sum(cart_price*sku_num) cart_amount</span><br><span class="line">from dwd.mall__fact_cart_info</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">and user_id is not null</span><br><span class="line">and date_format(create_time,&#39;yyyy-MM-dd&#39;)&#x3D;&#39;$db_date&#39;</span><br><span class="line">group by user_id</span><br><span class="line">),</span><br><span class="line">tmp_order as</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">user_id,</span><br><span class="line">count(*) order_count,</span><br><span class="line">sum(final_total_amount) order_amount</span><br><span class="line">from dwd.mall__fact_order_info</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">group by user_id</span><br><span class="line">) ,</span><br><span class="line">tmp_payment as</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">user_id,</span><br><span class="line">count(*) payment_count,</span><br><span class="line">sum(payment_amount) payment_amount</span><br><span class="line">from dwd.mall__fact_payment_info</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">group by user_id</span><br><span class="line">)</span><br><span class="line">insert overwrite table $hive_table_name partition(dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">user_actions.user_id,</span><br><span class="line">sum(user_actions.login_count),</span><br><span class="line">sum(user_actions.cart_count),</span><br><span class="line">sum(user_actions.cart_amount),</span><br><span class="line">sum(user_actions.order_count),</span><br><span class="line">sum(user_actions.order_amount),</span><br><span class="line">sum(user_actions.payment_count),</span><br><span class="line">sum(user_actions.payment_amount)</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">user_id,</span><br><span class="line">login_count,</span><br><span class="line">0 cart_count,</span><br><span class="line">0 cart_amount,</span><br><span class="line">0 order_count,</span><br><span class="line">0 order_amount,</span><br><span class="line">0 payment_count,</span><br><span class="line">0 payment_amount</span><br><span class="line">from</span><br><span class="line">tmp_login</span><br><span class="line">union all</span><br><span class="line">select</span><br><span class="line">user_id,</span><br><span class="line">0 login_count,</span><br><span class="line">cart_count,</span><br><span class="line">cart_amount,</span><br><span class="line">0 order_count,</span><br><span class="line">0 order_amount,</span><br><span class="line">0 payment_count,</span><br><span class="line">0 payment_amount</span><br><span class="line">from</span><br><span class="line">tmp_cart</span><br><span class="line">union all</span><br><span class="line">select</span><br><span class="line">user_id,</span><br><span class="line">0 login_count,</span><br><span class="line">0 cart_count,</span><br><span class="line">0 cart_amount,</span><br><span class="line">order_count,</span><br><span class="line">order_amount,</span><br><span class="line">0 payment_count,</span><br><span class="line">0 payment_amount</span><br><span class="line">from tmp_order</span><br><span class="line">union all</span><br><span class="line">select</span><br><span class="line">user_id,</span><br><span class="line">0 login_count,</span><br><span class="line">0 cart_count,</span><br><span class="line">0 cart_amount,</span><br><span class="line">0 order_count,</span><br><span class="line">0 order_amount,</span><br><span class="line">payment_count,</span><br><span class="line">payment_amount</span><br><span class="line">from tmp_payment</span><br><span class="line">) user_actions</span><br><span class="line">group by user_id;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="blogTitle40"></a></p><h3 id="10-3-每日商品行为-业务"><a href="#10-3-每日商品行为-业务" class="headerlink" title="10.3 每日商品行为(业务)"></a>10.3 每日商品行为(业务)</h3></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dws.mall__sku_action_daycount</span><br><span class="line">CREATE EXTERNAL TABLE &#96;dws.mall__sku_action_daycount&#96;(</span><br><span class="line">sku_id string comment &#39;sku_id&#39;,</span><br><span class="line">order_count bigint comment &#39;被下单次数&#39;,</span><br><span class="line">order_num bigint comment &#39;被下单件数&#39;,</span><br><span class="line">order_amount decimal(16,2) comment &#39;被下单金额&#39;,</span><br><span class="line">payment_count bigint comment &#39;被支付次数&#39;,</span><br><span class="line">payment_num bigint comment &#39;被支付件数&#39;,</span><br><span class="line">payment_amount decimal(16,2) comment &#39;被支付金额&#39;,</span><br><span class="line">refund_count bigint comment &#39;被退款次数&#39;,</span><br><span class="line">refund_num bigint comment &#39;被退款件数&#39;,</span><br><span class="line">refund_amount decimal(16,2) comment &#39;被退款金额&#39;,</span><br><span class="line">cart_count bigint comment &#39;被加入购物车次数&#39;,</span><br><span class="line">cart_num bigint comment &#39;被加入购物车件数&#39;,</span><br><span class="line">favor_count bigint comment &#39;被收藏次数&#39;,</span><br><span class="line">appraise_good_count bigint comment &#39;好评数&#39;,</span><br><span class="line">appraise_mid_count bigint comment &#39;中评数&#39;,</span><br><span class="line">appraise_bad_count bigint comment &#39;差评数&#39;,</span><br><span class="line">appraise_default_count bigint comment &#39;默认评价数&#39;</span><br><span class="line">  ) COMMENT &#39;每日商品行为表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dws&#x2F;mall&#x2F;sku_action_daycount&#x2F;&#39;</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br><span class="line">128</span><br><span class="line">129</span><br><span class="line">130</span><br><span class="line">131</span><br><span class="line">132</span><br><span class="line">133</span><br><span class="line">134</span><br><span class="line">135</span><br><span class="line">136</span><br><span class="line">137</span><br><span class="line">138</span><br><span class="line">139</span><br><span class="line">140</span><br><span class="line">141</span><br><span class="line">142</span><br><span class="line">143</span><br><span class="line">144</span><br><span class="line">145</span><br><span class="line">146</span><br><span class="line">147</span><br><span class="line">148</span><br><span class="line">149</span><br><span class="line">150</span><br><span class="line">151</span><br><span class="line">152</span><br><span class="line">153</span><br><span class="line">154</span><br><span class="line">155</span><br><span class="line">156</span><br><span class="line">157</span><br><span class="line">158</span><br><span class="line">159</span><br><span class="line">160</span><br><span class="line">161</span><br><span class="line">162</span><br><span class="line">163</span><br><span class="line">164</span><br><span class="line">165</span><br><span class="line">166</span><br><span class="line">167</span><br><span class="line">168</span><br><span class="line">169</span><br><span class="line">170</span><br><span class="line">171</span><br><span class="line">172</span><br><span class="line">173</span><br><span class="line">174</span><br><span class="line">175</span><br><span class="line">176</span><br><span class="line">177</span><br><span class="line">178</span><br><span class="line">179</span><br><span class="line">180</span><br><span class="line">181</span><br><span class="line">182</span><br><span class="line">183</span><br><span class="line">184</span><br><span class="line">185</span><br><span class="line">186</span><br><span class="line">187</span><br><span class="line">188</span><br><span class="line">189</span><br><span class="line">190</span><br><span class="line">191</span><br><span class="line">192</span><br><span class="line">193</span><br><span class="line">194</span><br><span class="line">195</span><br><span class="line">196</span><br><span class="line">197</span><br><span class="line">198</span><br><span class="line">199</span><br><span class="line">200</span><br><span class="line">201</span><br><span class="line">202</span><br><span class="line">203</span><br><span class="line">204</span><br><span class="line">205</span><br><span class="line">206</span><br><span class="line">207</span><br><span class="line">208</span><br><span class="line">209</span><br><span class="line">210</span><br><span class="line">211</span><br><span class="line">212</span><br><span class="line">213</span><br><span class="line">214</span><br><span class="line">215</span><br><span class="line">216</span><br><span class="line">217</span><br><span class="line">218</span><br><span class="line">219</span><br><span class="line">220</span><br><span class="line">221</span><br><span class="line">222</span><br><span class="line">223</span><br><span class="line">224</span><br><span class="line">225</span><br><span class="line">226</span><br><span class="line">227</span><br><span class="line">228</span><br><span class="line">229</span><br><span class="line">230</span><br><span class="line">231</span><br><span class="line">232</span><br><span class="line">233</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dws</span><br><span class="line">table_name&#x3D;sku_action_daycount</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">with</span><br><span class="line">tmp_order as</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">cast(sku_id as string) sku_id,</span><br><span class="line">count(*) order_count,</span><br><span class="line">sum(sku_num) order_num,</span><br><span class="line">sum(total_amount) order_amount</span><br><span class="line">from dwd.mall__fact_order_detail</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">group by sku_id</span><br><span class="line">),</span><br><span class="line">tmp_payment as</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">cast(sku_id as string) sku_id,</span><br><span class="line">count(*) payment_count,</span><br><span class="line">sum(sku_num) payment_num,</span><br><span class="line">sum(total_amount) payment_amount</span><br><span class="line">from dwd.mall__fact_order_detail</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">and order_id in</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">id</span><br><span class="line">from dwd.mall__fact_order_info</span><br><span class="line">where (dt&#x3D;&#39;$db_date&#39; or dt&#x3D;date_add(&#39;$db_date&#39;,-1))</span><br><span class="line">and date_format(payment_time,&#39;yyyy-MM-dd&#39;)&#x3D;&#39;$db_date&#39;</span><br><span class="line">)</span><br><span class="line">group by sku_id</span><br><span class="line">),</span><br><span class="line">tmp_refund as</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">cast(sku_id as string) sku_id,</span><br><span class="line">count(*) refund_count,</span><br><span class="line">sum(refund_num) refund_num,</span><br><span class="line">sum(refund_amount) refund_amount</span><br><span class="line">from dwd.mall__fact_order_refund_info</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">group by sku_id</span><br><span class="line">),</span><br><span class="line">tmp_cart as</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">cast(sku_id as string) sku_id,</span><br><span class="line">count(*) cart_count,</span><br><span class="line">sum(sku_num) cart_num</span><br><span class="line">from dwd.mall__fact_cart_info</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">and date_format(create_time,&#39;yyyy-MM-dd&#39;)&#x3D;&#39;$db_date&#39;</span><br><span class="line">group by sku_id</span><br><span class="line">),</span><br><span class="line">tmp_favor as</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">cast(sku_id as string) sku_id,</span><br><span class="line">count(*) favor_count</span><br><span class="line">from dwd.mall__fact_favor_info</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">and date_format(create_time,&#39;yyyy-MM-dd&#39;)&#x3D;&#39;$db_date&#39;</span><br><span class="line">group by sku_id</span><br><span class="line">),</span><br><span class="line">tmp_appraise as</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">cast(sku_id as string) sku_id,</span><br><span class="line">sum(if(appraise&#x3D;&#39;1201&#39;,1,0)) appraise_good_count,</span><br><span class="line">sum(if(appraise&#x3D;&#39;1202&#39;,1,0)) appraise_mid_count,</span><br><span class="line">sum(if(appraise&#x3D;&#39;1203&#39;,1,0)) appraise_bad_count,</span><br><span class="line">sum(if(appraise&#x3D;&#39;1204&#39;,1,0)) appraise_default_count</span><br><span class="line">from dwd.mall__fact_comment_info</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">group by sku_id</span><br><span class="line">)</span><br><span class="line">insert overwrite table $hive_table_name partition(dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">sku_id,</span><br><span class="line">sum(order_count),</span><br><span class="line">sum(order_num),</span><br><span class="line">sum(order_amount),</span><br><span class="line">sum(payment_count),</span><br><span class="line">sum(payment_num),</span><br><span class="line">sum(payment_amount),</span><br><span class="line">sum(refund_count),</span><br><span class="line">sum(refund_num),</span><br><span class="line">sum(refund_amount),</span><br><span class="line">sum(cart_count),</span><br><span class="line">sum(cart_num),</span><br><span class="line">sum(favor_count),</span><br><span class="line">sum(appraise_good_count),</span><br><span class="line">sum(appraise_mid_count),</span><br><span class="line">sum(appraise_bad_count),</span><br><span class="line">sum(appraise_default_count)</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">sku_id,</span><br><span class="line">order_count,</span><br><span class="line">order_num,</span><br><span class="line">order_amount,</span><br><span class="line">0 payment_count,</span><br><span class="line">0 payment_num,</span><br><span class="line">0 payment_amount,</span><br><span class="line">0 refund_count,</span><br><span class="line">0 refund_num,</span><br><span class="line">0 refund_amount,</span><br><span class="line">0 cart_count,</span><br><span class="line">0 cart_num,</span><br><span class="line">0 favor_count,</span><br><span class="line">0 appraise_good_count,</span><br><span class="line">0 appraise_mid_count,</span><br><span class="line">0 appraise_bad_count,</span><br><span class="line">0 appraise_default_count</span><br><span class="line">from tmp_order</span><br><span class="line">union all</span><br><span class="line">select</span><br><span class="line">sku_id,</span><br><span class="line">0 order_count,</span><br><span class="line">0 order_num,</span><br><span class="line">0 order_amount,</span><br><span class="line">payment_count,</span><br><span class="line">payment_num,</span><br><span class="line">payment_amount,</span><br><span class="line">0 refund_count,</span><br><span class="line">0 refund_num,</span><br><span class="line">0 refund_amount,</span><br><span class="line">0 cart_count,</span><br><span class="line">0 cart_num,</span><br><span class="line">0 favor_count,</span><br><span class="line">0 appraise_good_count,</span><br><span class="line">0 appraise_mid_count,</span><br><span class="line">0 appraise_bad_count,</span><br><span class="line">0 appraise_default_count</span><br><span class="line">from tmp_payment</span><br><span class="line">union all</span><br><span class="line">select</span><br><span class="line">sku_id,</span><br><span class="line">0 order_count,</span><br><span class="line">0 order_num,</span><br><span class="line">0 order_amount,</span><br><span class="line">0 payment_count,</span><br><span class="line">0 payment_num,</span><br><span class="line">0 payment_amount,</span><br><span class="line">refund_count,</span><br><span class="line">refund_num,</span><br><span class="line">refund_amount,</span><br><span class="line">0 cart_count,</span><br><span class="line">0 cart_num,</span><br><span class="line">0 favor_count,</span><br><span class="line">0 appraise_good_count,</span><br><span class="line">0 appraise_mid_count,</span><br><span class="line">0 appraise_bad_count,</span><br><span class="line">0 appraise_default_count</span><br><span class="line">from tmp_refund</span><br><span class="line">union all</span><br><span class="line">select</span><br><span class="line">sku_id,</span><br><span class="line">0 order_count,</span><br><span class="line">0 order_num,</span><br><span class="line">0 order_amount,</span><br><span class="line">0 payment_count,</span><br><span class="line">0 payment_num,</span><br><span class="line">0 payment_amount,</span><br><span class="line">0 refund_count,</span><br><span class="line">0 refund_num,</span><br><span class="line">0 refund_amount,</span><br><span class="line">cart_count,</span><br><span class="line">cart_num,</span><br><span class="line">0 favor_count,</span><br><span class="line">0 appraise_good_count,</span><br><span class="line">0 appraise_mid_count,</span><br><span class="line">0 appraise_bad_count,</span><br><span class="line">0 appraise_default_count</span><br><span class="line">from tmp_cart</span><br><span class="line">union all</span><br><span class="line">select</span><br><span class="line">sku_id,</span><br><span class="line">0 order_count,</span><br><span class="line">0 order_num,</span><br><span class="line">0 order_amount,</span><br><span class="line">0 payment_count,</span><br><span class="line">0 payment_num,</span><br><span class="line">0 payment_amount,</span><br><span class="line">0 refund_count,</span><br><span class="line">0 refund_num,</span><br><span class="line">0 refund_amount,</span><br><span class="line">0 cart_count,</span><br><span class="line">0 cart_num,</span><br><span class="line">favor_count,</span><br><span class="line">0 appraise_good_count,</span><br><span class="line">0 appraise_mid_count,</span><br><span class="line">0 appraise_bad_count,</span><br><span class="line">0 appraise_default_count</span><br><span class="line">from tmp_favor</span><br><span class="line">union all</span><br><span class="line">select</span><br><span class="line">sku_id,</span><br><span class="line">0 order_count,</span><br><span class="line">0 order_num,</span><br><span class="line">0 order_amount,</span><br><span class="line">0 payment_count,</span><br><span class="line">0 payment_num,</span><br><span class="line">0 payment_amount,</span><br><span class="line">0 refund_count,</span><br><span class="line">0 refund_num,</span><br><span class="line">0 refund_amount,</span><br><span class="line">0 cart_count,</span><br><span class="line">0 cart_num,</span><br><span class="line">0 favor_count,</span><br><span class="line">appraise_good_count,</span><br><span class="line">appraise_mid_count,</span><br><span class="line">appraise_bad_count,</span><br><span class="line">appraise_default_count</span><br><span class="line">from tmp_appraise</span><br><span class="line">)tmp</span><br><span class="line">group by sku_id;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="blogTitle41"></a></p><h3 id="10-4-每日优惠券统计-业务"><a href="#10-4-每日优惠券统计-业务" class="headerlink" title="10.4 每日优惠券统计(业务)"></a>10.4 每日优惠券统计(业务)</h3></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dws.mall__coupon_use_daycount</span><br><span class="line">CREATE EXTERNAL TABLE &#96;dws.mall__coupon_use_daycount&#96;(</span><br><span class="line">&#96;coupon_id&#96; string COMMENT &#39;优惠券 ID&#39;,</span><br><span class="line">&#96;coupon_name&#96; string COMMENT &#39;购物券名称&#39;,</span><br><span class="line">&#96;coupon_type&#96; string COMMENT &#39;购物券类型 1 现金券 2 折扣券 3 满减券 4 满件打折券&#39;,</span><br><span class="line">&#96;condition_amount&#96; string COMMENT &#39;满额数&#39;,</span><br><span class="line">&#96;condition_num&#96; string COMMENT &#39;满件数&#39;,</span><br><span class="line">&#96;activity_id&#96; string COMMENT &#39;活动编号&#39;,</span><br><span class="line">&#96;benefit_amount&#96; string COMMENT &#39;减金额&#39;,</span><br><span class="line">&#96;benefit_discount&#96; string COMMENT &#39;折扣&#39;,</span><br><span class="line">&#96;create_time&#96; string COMMENT &#39;创建时间&#39;,</span><br><span class="line">&#96;range_type&#96; string COMMENT &#39;范围类型 1、商品 2、品类 3、品牌&#39;,</span><br><span class="line">&#96;spu_id&#96; string COMMENT &#39;商品 id&#39;,</span><br><span class="line">&#96;tm_id&#96; string COMMENT &#39;品牌 id&#39;,</span><br><span class="line">&#96;category3_id&#96; string COMMENT &#39;品类 id&#39;,</span><br><span class="line">&#96;limit_num&#96; string COMMENT &#39;最多领用次数&#39;,</span><br><span class="line">&#96;get_count&#96; bigint COMMENT &#39;领用次数&#39;,</span><br><span class="line">&#96;using_count&#96; bigint COMMENT &#39;使用(下单)次数&#39;,</span><br><span class="line">&#96;used_count&#96; bigint COMMENT &#39;使用(支付)次数&#39;</span><br><span class="line">  ) COMMENT &#39;每日优惠券统计表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dws&#x2F;mall&#x2F;coupon_use_daycount&#x2F;&#39;</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dws</span><br><span class="line">table_name&#x3D;coupon_use_daycount</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name partition(dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">cu.coupon_id,</span><br><span class="line">ci.coupon_name,</span><br><span class="line">ci.coupon_type,</span><br><span class="line">ci.condition_amount,</span><br><span class="line">ci.condition_num,</span><br><span class="line">ci.activity_id,</span><br><span class="line">ci.benefit_amount,</span><br><span class="line">ci.benefit_discount,</span><br><span class="line">ci.create_time,</span><br><span class="line">ci.range_type,</span><br><span class="line">ci.spu_id,</span><br><span class="line">ci.tm_id,</span><br><span class="line">ci.category3_id,</span><br><span class="line">ci.limit_num,</span><br><span class="line">cu.get_count,</span><br><span class="line">cu.using_count,</span><br><span class="line">cu.used_count</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">coupon_id,</span><br><span class="line">sum(if(date_format(get_time,&#39;yyyy-MM-dd&#39;)&#x3D;&#39;$db_date&#39;,1,0))</span><br><span class="line">get_count,</span><br><span class="line">sum(if(date_format(using_time,&#39;yyyy-MM-dd&#39;)&#x3D;&#39;$db_date&#39;,1,0))</span><br><span class="line">using_count,</span><br><span class="line">sum(if(date_format(used_time,&#39;yyyy-MM-dd&#39;)&#x3D;&#39;$db_date&#39;,1,0))</span><br><span class="line">used_count</span><br><span class="line">from dwd.mall__fact_coupon_use</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">group by coupon_id</span><br><span class="line">)cu</span><br><span class="line">left join</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">*</span><br><span class="line">from dwd.mall__dim_coupon_info</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">)ci on cu.coupon_id&#x3D;ci.id;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="blogTitle42"></a></p><h3 id="10-5-每日活动统计-业务"><a href="#10-5-每日活动统计-业务" class="headerlink" title="10.5 每日活动统计(业务)"></a>10.5 每日活动统计(业务)</h3></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dws.mall__activity_info_daycount</span><br><span class="line">CREATE EXTERNAL TABLE &#96;dws.mall__activity_info_daycount&#96;(</span><br><span class="line">&#96;id&#96; string COMMENT &#39;编号&#39;,</span><br><span class="line">&#96;activity_name&#96; string COMMENT &#39;活动名称&#39;,</span><br><span class="line">&#96;activity_type&#96; string COMMENT &#39;活动类型&#39;,</span><br><span class="line">&#96;start_time&#96; string COMMENT &#39;开始时间&#39;,</span><br><span class="line">&#96;end_time&#96; string COMMENT &#39;结束时间&#39;,</span><br><span class="line">&#96;create_time&#96; string COMMENT &#39;创建时间&#39;,</span><br><span class="line">&#96;order_count&#96; bigint COMMENT &#39;下单次数&#39;,</span><br><span class="line">&#96;payment_count&#96; bigint COMMENT &#39;支付次数&#39;</span><br><span class="line">  ) COMMENT &#39;每日活动统计表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dws&#x2F;mall&#x2F;activity_info_daycount&#x2F;&#39;</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dws</span><br><span class="line">table_name&#x3D;activity_info_daycount</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name partition(dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">oi.activity_id,</span><br><span class="line">ai.activity_name,</span><br><span class="line">ai.activity_type,</span><br><span class="line">ai.start_time,</span><br><span class="line">ai.end_time,</span><br><span class="line">ai.create_time,</span><br><span class="line">oi.order_count,</span><br><span class="line">oi.payment_count</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">activity_id,</span><br><span class="line">sum(if(date_format(create_time,&#39;yyyy-MM-dd&#39;)&#x3D;&#39;$db_date&#39;,1,0))</span><br><span class="line">order_count,</span><br><span class="line">sum(if(date_format(payment_time,&#39;yyyy-MM-dd&#39;)&#x3D;&#39;$db_date&#39;,1,0))</span><br><span class="line">payment_count</span><br><span class="line">from dwd.mall__fact_order_info</span><br><span class="line">where (dt&#x3D;&#39;$db_date&#39; or dt&#x3D;date_add(&#39;$db_date&#39;,-1))</span><br><span class="line">and activity_id is not null</span><br><span class="line">group by activity_id</span><br><span class="line">)oi</span><br><span class="line">join</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">*</span><br><span class="line">from dwd.mall__dim_activity_info</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">)ai</span><br><span class="line">on oi.activity_id&#x3D;ai.id;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="blogTitle43"></a></p><h3 id="10-6-每日购买行为-业务"><a href="#10-6-每日购买行为-业务" class="headerlink" title="10.6 每日购买行为(业务)"></a>10.6 每日购买行为(业务)</h3></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dws.mall__sale_detail_daycount</span><br><span class="line">CREATE EXTERNAL TABLE &#96;dws.mall__sale_detail_daycount&#96;(</span><br><span class="line">user_id string comment &#39;用户 id&#39;,</span><br><span class="line">sku_id string comment &#39;商品 id&#39;,</span><br><span class="line">user_gender string comment &#39;用户性别&#39;,</span><br><span class="line">user_age string comment &#39;用户年龄&#39;,</span><br><span class="line">user_level string comment &#39;用户等级&#39;,</span><br><span class="line">order_price decimal(10,2) comment &#39;商品价格&#39;,</span><br><span class="line">sku_name string comment &#39;商品名称&#39;,</span><br><span class="line">sku_tm_id string comment &#39;品牌 id&#39;,</span><br><span class="line">sku_category3_id string comment &#39;商品三级品类 id&#39;,</span><br><span class="line">sku_category2_id string comment &#39;商品二级品类 id&#39;,</span><br><span class="line">sku_category1_id string comment &#39;商品一级品类 id&#39;,</span><br><span class="line">sku_category3_name string comment &#39;商品三级品类名称&#39;,</span><br><span class="line">sku_category2_name string comment &#39;商品二级品类名称&#39;,</span><br><span class="line">sku_category1_name string comment &#39;商品一级品类名称&#39;,</span><br><span class="line">spu_id string comment &#39;商品 spu&#39;,</span><br><span class="line">sku_num int comment &#39;购买个数&#39;,</span><br><span class="line">order_count bigint comment &#39;当日下单单数&#39;,</span><br><span class="line">order_amount decimal(16,2) comment &#39;当日下单金额&#39;</span><br><span class="line">  ) COMMENT &#39;每日购买行为表&#39;</span><br><span class="line">PARTITIONED BY (</span><br><span class="line">  &#96;dt&#96; String COMMENT &#39;partition&#39;</span><br><span class="line">)</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dws&#x2F;mall&#x2F;sale_detail_daycount&#x2F;&#39;</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dws</span><br><span class="line">table_name&#x3D;sale_detail_daycount</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name partition(dt&#x3D;&#39;$db_date&#39;)</span><br><span class="line">select</span><br><span class="line">op.user_id,</span><br><span class="line">op.sku_id,</span><br><span class="line">ui.gender,</span><br><span class="line">months_between(&#39;$db_date&#39;, ui.birthday)&#x2F;12 age,</span><br><span class="line">ui.user_level,</span><br><span class="line">si.price,</span><br><span class="line">si.sku_name,</span><br><span class="line">si.tm_id,</span><br><span class="line">si.category3_id,</span><br><span class="line">si.category2_id,</span><br><span class="line">si.category1_id,</span><br><span class="line">si.category3_name,</span><br><span class="line">si.category2_name,</span><br><span class="line">si.category1_name,</span><br><span class="line">si.spu_id,</span><br><span class="line">op.sku_num,</span><br><span class="line">op.order_count,</span><br><span class="line">op.order_amount</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">user_id,</span><br><span class="line">sku_id,</span><br><span class="line">sum(sku_num) sku_num,</span><br><span class="line">count(*) order_count,</span><br><span class="line">sum(total_amount) order_amount</span><br><span class="line">from dwd.mall__fact_order_detail</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">group by user_id, sku_id</span><br><span class="line">)op</span><br><span class="line">join</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">*</span><br><span class="line">from dwd.mall__dim_user_info_his</span><br><span class="line">where end_date&#x3D;&#39;9999-99-99&#39;</span><br><span class="line">)ui on op.user_id &#x3D; ui.id</span><br><span class="line">join</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">*</span><br><span class="line">from dwd.mall__dim_sku_info</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">)si on op.sku_id &#x3D; si.id;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="blogTitle44"></a></p><h2 id="11-DWT层构建"><a href="#11-DWT层构建" class="headerlink" title="11 DWT层构建"></a>11 DWT层构建</h2><blockquote><p>此层主要针对dws层每日数据进行汇总，不建立分区，不压缩，每日进行数据覆盖</p></blockquote></li></ul><p><a name="blogTitle45"></a></p><h3 id="11-1-设备主题宽表"><a href="#11-1-设备主题宽表" class="headerlink" title="11.1 设备主题宽表"></a>11.1 设备主题宽表</h3><ul><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwt.mall__uv_topic</span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwt.mall__uv_topic&#96;(</span><br><span class="line">&#96;mid_id&#96; string COMMENT &#39;设备唯一标识&#39;,</span><br><span class="line">&#96;user_id&#96; string COMMENT &#39;用户标识&#39;,</span><br><span class="line">&#96;version_code&#96; string COMMENT &#39;程序版本号&#39;,</span><br><span class="line">&#96;version_name&#96; string COMMENT &#39;程序版本名&#39;,</span><br><span class="line">&#96;lang&#96; string COMMENT &#39;系统语言&#39;,</span><br><span class="line">&#96;source&#96; string COMMENT &#39;渠道号&#39;,</span><br><span class="line">&#96;os&#96; string COMMENT &#39;安卓系统版本&#39;,</span><br><span class="line">&#96;area&#96; string COMMENT &#39;区域&#39;,</span><br><span class="line">&#96;model&#96; string COMMENT &#39;手机型号&#39;,</span><br><span class="line">&#96;brand&#96; string COMMENT &#39;手机品牌&#39;,</span><br><span class="line">&#96;sdk_version&#96; string COMMENT &#39;sdkVersion&#39;,</span><br><span class="line">&#96;gmail&#96; string COMMENT &#39;gmail&#39;,</span><br><span class="line">&#96;height_width&#96; string COMMENT &#39;屏幕宽高&#39;,</span><br><span class="line">&#96;app_time&#96; string COMMENT &#39;客户端日志产生时的时间&#39;,</span><br><span class="line">&#96;network&#96; string  COMMENT &#39;网络模式&#39;,</span><br><span class="line">&#96;lng&#96; string COMMENT &#39;经度&#39;,</span><br><span class="line">&#96;lat&#96; string COMMENT &#39;纬度&#39;,</span><br><span class="line">&#96;login_date_first&#96; string comment &#39;首次活跃时间&#39;,</span><br><span class="line">&#96;login_date_last&#96; string comment &#39;末次活跃时间&#39;,</span><br><span class="line">&#96;login_day_count&#96; bigint comment &#39;当日活跃次数&#39;,</span><br><span class="line">&#96;login_count&#96; bigint comment &#39;累积活跃天数&#39;</span><br><span class="line">  ) COMMENT &#39;设备主题宽表&#39;</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwt&#x2F;mall&#x2F;uv_topic&#x2F;&#39;</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwt</span><br><span class="line">table_name&#x3D;uv_topic</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">select</span><br><span class="line">nvl(new.mid_id,old.mid_id),</span><br><span class="line">nvl(new.user_id,old.user_id),</span><br><span class="line">nvl(new.version_code,old.version_code),</span><br><span class="line">nvl(new.version_name,old.version_name),</span><br><span class="line">nvl(new.lang,old.lang),</span><br><span class="line">nvl(new.source,old.source),</span><br><span class="line">nvl(new.os,old.os),</span><br><span class="line">nvl(new.area,old.area),</span><br><span class="line">nvl(new.model,old.model),</span><br><span class="line">nvl(new.brand,old.brand),</span><br><span class="line">nvl(new.sdk_version,old.sdk_version),</span><br><span class="line">nvl(new.gmail,old.gmail),</span><br><span class="line">nvl(new.height_width,old.height_width),</span><br><span class="line">nvl(new.app_time,old.app_time),</span><br><span class="line">nvl(new.network,old.network),</span><br><span class="line">nvl(new.lng,old.lng),</span><br><span class="line">nvl(new.lat,old.lat),</span><br><span class="line">if(old.mid_id is null,&#39;2020-03-10&#39;,old.login_date_first),</span><br><span class="line">if(new.mid_id is not null,&#39;2020-03-10&#39;,old.login_date_last),</span><br><span class="line">if(new.mid_id is not null, new.login_count,0),</span><br><span class="line">nvl(old.login_count,0)+if(new.login_count&gt;0,1,0)</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">*</span><br><span class="line">from dwt.mall__uv_topic</span><br><span class="line">)old</span><br><span class="line">full outer join</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">*</span><br><span class="line">from dws.mall__uv_detail_daycount</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">)new</span><br><span class="line">on old.mid_id&#x3D;new.mid_id;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="blogTitle46"></a></p><h3 id="11-2-会员主题宽表"><a href="#11-2-会员主题宽表" class="headerlink" title="11.2 会员主题宽表"></a>11.2 会员主题宽表</h3></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwt.mall__user_topic</span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwt.mall__user_topic&#96;(</span><br><span class="line">user_id string comment &#39;用户 id&#39;,</span><br><span class="line">login_date_first string comment &#39;首次登录时间&#39;,</span><br><span class="line">login_date_last string comment &#39;末次登录时间&#39;,</span><br><span class="line">login_count bigint comment &#39;累积登录天数&#39;,</span><br><span class="line">login_last_30d_count bigint comment &#39;最近 30 日登录天数&#39;,</span><br><span class="line">order_date_first string comment &#39;首次下单时间&#39;,</span><br><span class="line">order_date_last string comment &#39;末次下单时间&#39;,</span><br><span class="line">order_count bigint comment &#39;累积下单次数&#39;,</span><br><span class="line">order_amount decimal(16,2) comment &#39;累积下单金额&#39;,</span><br><span class="line">order_last_30d_count bigint comment &#39;最近 30 日下单次数&#39;,</span><br><span class="line">order_last_30d_amount bigint comment &#39;最近 30 日下单金额&#39;,</span><br><span class="line">payment_date_first string comment &#39;首次支付时间&#39;,</span><br><span class="line">payment_date_last string comment &#39;末次支付时间&#39;,</span><br><span class="line">payment_count decimal(16,2) comment &#39;累积支付次数&#39;,</span><br><span class="line">payment_amount decimal(16,2) comment &#39;累积支付金额&#39;,</span><br><span class="line">payment_last_30d_count decimal(16,2) comment &#39;最近 30 日支付次数&#39;,</span><br><span class="line">payment_last_30d_amount decimal(16,2) comment &#39;最近 30 日支付金额&#39;</span><br><span class="line">  ) COMMENT &#39;会员主题宽表&#39;</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwt&#x2F;mall&#x2F;user_topic&#x2F;&#39;</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwt</span><br><span class="line">table_name&#x3D;user_topic</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">select</span><br><span class="line">nvl(new.user_id,old.user_id),</span><br><span class="line">if(old.login_date_first is null and</span><br><span class="line">new.login_count&gt;0,&#39;$db_date&#39;,old.login_date_first),</span><br><span class="line">if(new.login_count&gt;0,&#39;$db_date&#39;,old.login_date_last),</span><br><span class="line">nvl(old.login_count,0)+if(new.login_count&gt;0,1,0),</span><br><span class="line">nvl(new.login_last_30d_count,0),</span><br><span class="line">if(old.order_date_first is null and</span><br><span class="line">new.order_count&gt;0,&#39;$db_date&#39;,old.order_date_first),</span><br><span class="line">if(new.order_count&gt;0,&#39;$db_date&#39;,old.order_date_last),</span><br><span class="line">nvl(old.order_count,0)+nvl(new.order_count,0),</span><br><span class="line">nvl(old.order_amount,0)+nvl(new.order_amount,0),</span><br><span class="line">nvl(new.order_last_30d_count,0),</span><br><span class="line">nvl(new.order_last_30d_amount,0),</span><br><span class="line">if(old.payment_date_first is null and</span><br><span class="line">new.payment_count&gt;0,&#39;$db_date&#39;,old.payment_date_first),</span><br><span class="line">if(new.payment_count&gt;0,&#39;$db_date&#39;,old.payment_date_last),</span><br><span class="line">nvl(old.payment_count,0)+nvl(new.payment_count,0),</span><br><span class="line">nvl(old.payment_amount,0)+nvl(new.payment_amount,0),</span><br><span class="line">nvl(new.payment_last_30d_count,0),</span><br><span class="line">nvl(new.payment_last_30d_amount,0)</span><br><span class="line">from</span><br><span class="line">dwt.mall__user_topic old</span><br><span class="line">full outer join</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">user_id,</span><br><span class="line">sum(if(dt&#x3D;&#39;$db_date&#39;,login_count,0)) login_count,</span><br><span class="line">sum(if(dt&#x3D;&#39;$db_date&#39;,order_count,0)) order_count,</span><br><span class="line">sum(if(dt&#x3D;&#39;$db_date&#39;,order_amount,0)) order_amount,</span><br><span class="line">sum(if(dt&#x3D;&#39;$db_date&#39;,payment_count,0)) payment_count,</span><br><span class="line">sum(if(dt&#x3D;&#39;$db_date&#39;,payment_amount,0)) payment_amount,</span><br><span class="line">sum(if(login_count&gt;0,1,0)) login_last_30d_count,</span><br><span class="line">sum(order_count) order_last_30d_count,</span><br><span class="line">sum(order_amount) order_last_30d_amount,</span><br><span class="line">sum(payment_count) payment_last_30d_count,</span><br><span class="line">sum(payment_amount) payment_last_30d_amount</span><br><span class="line">from dws.mall__user_action_daycount</span><br><span class="line">where dt&gt;&#x3D;date_add( &#39;$db_date&#39;,-30)</span><br><span class="line">group by user_id</span><br><span class="line">)new</span><br><span class="line">on old.user_id&#x3D;new.user_id;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="blogTitle47"></a></p><h3 id="11-3-商品主题宽表"><a href="#11-3-商品主题宽表" class="headerlink" title="11.3 商品主题宽表"></a>11.3 商品主题宽表</h3></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwt.mall__sku_topic</span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwt.mall__sku_topic&#96;(</span><br><span class="line">sku_id string comment &#39;sku_id&#39;,</span><br><span class="line">spu_id string comment &#39;spu_id&#39;,</span><br><span class="line">order_last_30d_count bigint comment &#39;最近 30 日被下单次数&#39;,</span><br><span class="line">order_last_30d_num bigint comment &#39;最近 30 日被下单件数&#39;,</span><br><span class="line">order_last_30d_amount decimal(16,2) comment &#39;最近 30 日被下单金额&#39;,</span><br><span class="line">order_count bigint comment &#39;累积被下单次数&#39;,</span><br><span class="line">order_num bigint comment &#39;累积被下单件数&#39;,</span><br><span class="line">order_amount decimal(16,2) comment &#39;累积被下单金额&#39;,</span><br><span class="line">payment_last_30d_count bigint comment &#39;最近 30 日被支付次数&#39;,</span><br><span class="line">payment_last_30d_num bigint comment &#39;最近 30 日被支付件数&#39;,</span><br><span class="line">payment_last_30d_amount decimal(16,2) comment &#39;最近 30 日被支付金额&#39;,</span><br><span class="line">payment_count bigint comment &#39;累积被支付次数&#39;,</span><br><span class="line">payment_num bigint comment &#39;累积被支付件数&#39;,</span><br><span class="line">payment_amount decimal(16,2) comment &#39;累积被支付金额&#39;,</span><br><span class="line">refund_last_30d_count bigint comment &#39;最近三十日退款次数&#39;,</span><br><span class="line">refund_last_30d_num bigint comment &#39;最近三十日退款件数&#39;,</span><br><span class="line">refund_last_30d_amount decimal(10,2) comment &#39;最近三十日退款金额&#39;,</span><br><span class="line">refund_count bigint comment &#39;累积退款次数&#39;,</span><br><span class="line">refund_num bigint comment &#39;累积退款件数&#39;,</span><br><span class="line">refund_amount decimal(10,2) comment &#39;累积退款金额&#39;,</span><br><span class="line">cart_last_30d_count bigint comment &#39;最近 30 日被加入购物车次数&#39;,</span><br><span class="line">cart_last_30d_num bigint comment &#39;最近 30 日被加入购物车件数&#39;,</span><br><span class="line">cart_count bigint comment &#39;累积被加入购物车次数&#39;,</span><br><span class="line">cart_num bigint comment &#39;累积被加入购物车件数&#39;,</span><br><span class="line">favor_last_30d_count bigint comment &#39;最近 30 日被收藏次数&#39;,</span><br><span class="line">favor_count bigint comment &#39;累积被收藏次数&#39;,</span><br><span class="line">appraise_last_30d_good_count bigint comment &#39;最近 30 日好评数&#39;,</span><br><span class="line">appraise_last_30d_mid_count bigint comment &#39;最近 30 日中评数&#39;,</span><br><span class="line">appraise_last_30d_bad_count bigint comment &#39;最近 30 日差评数&#39;,</span><br><span class="line">appraise_last_30d_default_count bigint comment &#39;最近 30 日默认评价数&#39;,</span><br><span class="line">appraise_good_count bigint comment &#39;累积好评数&#39;,</span><br><span class="line">appraise_mid_count bigint comment &#39;累积中评数&#39;,</span><br><span class="line">appraise_bad_count bigint comment &#39;累积差评数&#39;,</span><br><span class="line">appraise_default_count bigint comment &#39;累积默认评价数&#39;</span><br><span class="line">  ) COMMENT &#39;商品主题宽表&#39;</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwt&#x2F;mall&#x2F;sku_topic&#x2F;&#39;</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br><span class="line">128</span><br><span class="line">129</span><br><span class="line">130</span><br><span class="line">131</span><br><span class="line">132</span><br><span class="line">133</span><br><span class="line">134</span><br><span class="line">135</span><br><span class="line">136</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwt</span><br><span class="line">table_name&#x3D;sku_topic</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">select</span><br><span class="line">nvl(new.sku_id,old.sku_id), sku_info.spu_id,</span><br><span class="line">nvl(new.order_count30,0),</span><br><span class="line">nvl(new.order_num30,0),</span><br><span class="line">nvl(new.order_amount30,0),</span><br><span class="line">nvl(old.order_count,0) + nvl(new.order_count,0),</span><br><span class="line">nvl(old.order_num,0) + nvl(new.order_num,0),</span><br><span class="line">nvl(old.order_amount,0) + nvl(new.order_amount,0),</span><br><span class="line">nvl(new.payment_count30,0),</span><br><span class="line">nvl(new.payment_num30,0),</span><br><span class="line">nvl(new.payment_amount30,0),</span><br><span class="line">nvl(old.payment_count,0) + nvl(new.payment_count,0),</span><br><span class="line">nvl(old.payment_num,0) + nvl(new.payment_count,0),</span><br><span class="line">nvl(old.payment_amount,0) + nvl(new.payment_count,0),</span><br><span class="line">nvl(new.refund_count30,0),</span><br><span class="line">nvl(new.refund_num30,0),</span><br><span class="line">nvl(new.refund_amount30,0),</span><br><span class="line">nvl(old.refund_count,0) + nvl(new.refund_count,0),</span><br><span class="line">nvl(old.refund_num,0) + nvl(new.refund_num,0),</span><br><span class="line">nvl(old.refund_amount,0) + nvl(new.refund_amount,0),</span><br><span class="line">nvl(new.cart_count30,0),</span><br><span class="line">nvl(new.cart_num30,0),</span><br><span class="line">nvl(old.cart_count,0) + nvl(new.cart_count,0),</span><br><span class="line">nvl(old.cart_num,0) + nvl(new.cart_num,0),</span><br><span class="line">nvl(new.favor_count30,0),</span><br><span class="line">nvl(old.favor_count,0) + nvl(new.favor_count,0),</span><br><span class="line">nvl(new.appraise_good_count30,0),</span><br><span class="line">nvl(new.appraise_mid_count30,0),</span><br><span class="line">nvl(new.appraise_bad_count30,0),</span><br><span class="line">nvl(new.appraise_default_count30,0) ,</span><br><span class="line">nvl(old.appraise_good_count,0) + nvl(new.appraise_good_count,0),</span><br><span class="line">nvl(old.appraise_mid_count,0) + nvl(new.appraise_mid_count,0),</span><br><span class="line">nvl(old.appraise_bad_count,0) + nvl(new.appraise_bad_count,0),</span><br><span class="line">nvl(old.appraise_default_count,0) + nvl(new.appraise_default_count,0)</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">sku_id,</span><br><span class="line">spu_id,</span><br><span class="line">order_last_30d_count,</span><br><span class="line">order_last_30d_num,</span><br><span class="line">order_last_30d_amount,</span><br><span class="line">order_count,</span><br><span class="line">order_num,</span><br><span class="line">order_amount ,</span><br><span class="line">payment_last_30d_count,</span><br><span class="line">payment_last_30d_num,</span><br><span class="line">payment_last_30d_amount,</span><br><span class="line">payment_count,</span><br><span class="line">payment_num,</span><br><span class="line">payment_amount,</span><br><span class="line">refund_last_30d_count,</span><br><span class="line">refund_last_30d_num,</span><br><span class="line">refund_last_30d_amount,</span><br><span class="line">refund_count,</span><br><span class="line">refund_num,</span><br><span class="line">refund_amount,</span><br><span class="line">cart_last_30d_count,</span><br><span class="line">cart_last_30d_num,</span><br><span class="line">cart_count,</span><br><span class="line">cart_num,</span><br><span class="line">favor_last_30d_count,</span><br><span class="line">favor_count,</span><br><span class="line">appraise_last_30d_good_count,</span><br><span class="line">appraise_last_30d_mid_count,</span><br><span class="line">appraise_last_30d_bad_count,</span><br><span class="line">appraise_last_30d_default_count,</span><br><span class="line">appraise_good_count,</span><br><span class="line">appraise_mid_count,</span><br><span class="line">appraise_bad_count,</span><br><span class="line">appraise_default_count</span><br><span class="line">from dwt.mall__sku_topic</span><br><span class="line">)old</span><br><span class="line">full outer join</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">sku_id,</span><br><span class="line">sum(if(dt&#x3D;&#39;$db_date&#39;, order_count,0 )) order_count,</span><br><span class="line">sum(if(dt&#x3D;&#39;$db_date&#39;,order_num ,0 )) order_num,</span><br><span class="line">sum(if(dt&#x3D;&#39;$db_date&#39;,order_amount,0 )) order_amount ,</span><br><span class="line">sum(if(dt&#x3D;&#39;$db_date&#39;,payment_count,0 )) payment_count,</span><br><span class="line">sum(if(dt&#x3D;&#39;$db_date&#39;,payment_num,0 )) payment_num,</span><br><span class="line">sum(if(dt&#x3D;&#39;$db_date&#39;,payment_amount,0 )) payment_amount,</span><br><span class="line">sum(if(dt&#x3D;&#39;$db_date&#39;,refund_count,0 )) refund_count,</span><br><span class="line">sum(if(dt&#x3D;&#39;$db_date&#39;,refund_num,0 )) refund_num,</span><br><span class="line">sum(if(dt&#x3D;&#39;$db_date&#39;,refund_amount,0 )) refund_amount,</span><br><span class="line">sum(if(dt&#x3D;&#39;$db_date&#39;,cart_count,0 )) cart_count,</span><br><span class="line">sum(if(dt&#x3D;&#39;$db_date&#39;,cart_num,0 )) cart_num,</span><br><span class="line">sum(if(dt&#x3D;&#39;$db_date&#39;,favor_count,0 )) favor_count,</span><br><span class="line">sum(if(dt&#x3D;&#39;$db_date&#39;,appraise_good_count,0 )) appraise_good_count,</span><br><span class="line">sum(if(dt&#x3D;&#39;$db_date&#39;,appraise_mid_count,0 ) ) appraise_mid_count ,</span><br><span class="line">sum(if(dt&#x3D;&#39;$db_date&#39;,appraise_bad_count,0 )) appraise_bad_count,</span><br><span class="line">sum(if(dt&#x3D;&#39;$db_date&#39;,appraise_default_count,0 )) appraise_default_count,</span><br><span class="line">sum(order_count) order_count30 ,</span><br><span class="line">sum(order_num) order_num30,</span><br><span class="line">sum(order_amount) order_amount30,</span><br><span class="line">sum(payment_count) payment_count30,</span><br><span class="line">sum(payment_num) payment_num30,</span><br><span class="line">sum(payment_amount) payment_amount30,</span><br><span class="line">sum(refund_count) refund_count30,</span><br><span class="line">sum(refund_num) refund_num30,</span><br><span class="line">sum(refund_amount) refund_amount30,</span><br><span class="line">sum(cart_count) cart_count30,</span><br><span class="line">sum(cart_num) cart_num30,</span><br><span class="line">sum(favor_count) favor_count30,</span><br><span class="line">sum(appraise_good_count) appraise_good_count30,</span><br><span class="line">sum(appraise_mid_count) appraise_mid_count30,</span><br><span class="line">sum(appraise_bad_count) appraise_bad_count30,</span><br><span class="line">sum(appraise_default_count) appraise_default_count30</span><br><span class="line">from dws.mall__sku_action_daycount</span><br><span class="line">where dt &gt;&#x3D; date_add (&#39;$db_date&#39;, -30)</span><br><span class="line">group by sku_id</span><br><span class="line">)new</span><br><span class="line">on new.sku_id &#x3D; old.sku_id</span><br><span class="line">left join</span><br><span class="line">(</span><br><span class="line">select * from dwd.mall__dim_sku_info where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">) sku_info</span><br><span class="line">on nvl(new.sku_id,old.sku_id)&#x3D; sku_info.id;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="blogTitle48"></a></p><h3 id="11-4-优惠卷主题宽表"><a href="#11-4-优惠卷主题宽表" class="headerlink" title="11.4 优惠卷主题宽表"></a>11.4 优惠卷主题宽表</h3></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwt.mall__coupon_topic</span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwt.mall__coupon_topic&#96;(</span><br><span class="line">&#96;coupon_id&#96; string COMMENT &#39;优惠券 ID&#39;,</span><br><span class="line">&#96;get_day_count&#96; bigint COMMENT &#39;当日领用次数&#39;,</span><br><span class="line">&#96;using_day_count&#96; bigint COMMENT &#39;当日使用(下单)次数&#39;,</span><br><span class="line">&#96;used_day_count&#96; bigint COMMENT &#39;当日使用(支付)次数&#39;,</span><br><span class="line">&#96;get_count&#96; bigint COMMENT &#39;累积领用次数&#39;,</span><br><span class="line">&#96;using_count&#96; bigint COMMENT &#39;累积使用(下单)次数&#39;,</span><br><span class="line">&#96;used_count&#96; bigint COMMENT &#39;累积使用(支付)次数&#39;</span><br><span class="line">  ) COMMENT &#39;优惠券主题宽表&#39;</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwt&#x2F;mall&#x2F;coupon_topic&#x2F;&#39;</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwt</span><br><span class="line">table_name&#x3D;coupon_topic</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">select</span><br><span class="line">nvl(new.coupon_id,old.coupon_id),</span><br><span class="line">nvl(new.get_count,0),</span><br><span class="line">nvl(new.using_count,0),</span><br><span class="line">nvl(new.used_count,0),</span><br><span class="line">nvl(old.get_count,0)+nvl(new.get_count,0),</span><br><span class="line">nvl(old.using_count,0)+nvl(new.using_count,0),</span><br><span class="line">nvl(old.used_count,0)+nvl(new.used_count,0)</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">*</span><br><span class="line">from dwt.mall__coupon_topic</span><br><span class="line">)old</span><br><span class="line">full outer join</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">coupon_id,</span><br><span class="line">get_count,</span><br><span class="line">using_count,</span><br><span class="line">used_count</span><br><span class="line">from dws.mall__coupon_use_daycount</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">)new</span><br><span class="line">on old.coupon_id&#x3D;new.coupon_id;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="blogTitle49"></a></p><h3 id="11-5-活动主题宽表"><a href="#11-5-活动主题宽表" class="headerlink" title="11.5 活动主题宽表"></a>11.5 活动主题宽表</h3></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists dwt.mall__activity_topic</span><br><span class="line">CREATE EXTERNAL TABLE &#96;dwt.mall__activity_topic&#96;(</span><br><span class="line">&#96;id&#96; string COMMENT &#39;活动 id&#39;,</span><br><span class="line">&#96;activity_name&#96; string COMMENT &#39;活动名称&#39;,</span><br><span class="line">&#96;order_day_count&#96; bigint COMMENT &#39;当日日下单次数&#39;,</span><br><span class="line">&#96;payment_day_count&#96; bigint COMMENT &#39;当日支付次数&#39;,</span><br><span class="line">&#96;order_count&#96; bigint COMMENT &#39;累积下单次数&#39;,</span><br><span class="line">&#96;payment_count&#96; bigint COMMENT &#39;累积支付次数&#39;</span><br><span class="line">  ) COMMENT &#39;活动主题宽表&#39;</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;dwt&#x2F;mall&#x2F;activity_topic&#x2F;&#39;</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;dwt</span><br><span class="line">table_name&#x3D;activity_topic</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert overwrite table $hive_table_name</span><br><span class="line">select</span><br><span class="line">nvl(new.id,old.id),</span><br><span class="line">nvl(new.activity_name,old.activity_name),</span><br><span class="line">nvl(new.order_count,0),</span><br><span class="line">nvl(new.payment_count,0),</span><br><span class="line">nvl(old.order_count,0)+nvl(new.order_count,0),</span><br><span class="line">nvl(old.payment_count,0)+nvl(new.payment_count,0)</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">*</span><br><span class="line">from dwt.mall__activity_topic</span><br><span class="line">)old</span><br><span class="line">full outer join</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">id,</span><br><span class="line">activity_name,</span><br><span class="line">order_count,</span><br><span class="line">payment_count</span><br><span class="line">from dws.mall__activity_info_daycount</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">)new</span><br><span class="line">on old.id&#x3D;new.id;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="blogTitle50"></a></p><h2 id="12-ADS层构建"><a href="#12-ADS层构建" class="headerlink" title="12 ADS层构建"></a>12 ADS层构建</h2><blockquote><p>此层为最终数据需求层，考虑数据导出和数据数量决定是否需要压缩，不需要分区，每天刷写</p></blockquote></li></ul><p><a name="blogTitle51"></a></p><h3 id="12-1-设备主题"><a href="#12-1-设备主题" class="headerlink" title="12.1 设备主题"></a>12.1 设备主题</h3><p><a name="39d05c99"></a></p><h4 id="12-1-1-活跃设备数（日、周、月）"><a href="#12-1-1-活跃设备数（日、周、月）" class="headerlink" title="12.1.1 活跃设备数（日、周、月）"></a>12.1.1 活跃设备数（日、周、月）</h4><blockquote><p>日活：当日活跃的设备数<br>周活：当周活跃的设备数<br>月活：当月活跃的设备数</p></blockquote><ul><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ads.mall__uv_count</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ads.mall__uv_count&#96;(</span><br><span class="line">&#96;dt&#96; string COMMENT &#39;统计日期&#39;,</span><br><span class="line">&#96;day_count&#96; bigint COMMENT &#39;当日用户数量&#39;,</span><br><span class="line">&#96;wk_count&#96; bigint COMMENT &#39;当周用户数量&#39;,</span><br><span class="line">&#96;mn_count&#96; bigint COMMENT &#39;当月用户数量&#39;,</span><br><span class="line">&#96;is_weekend&#96; string COMMENT &#39;Y,N 是否是周末,用于得到本周最终结果&#39;,</span><br><span class="line">&#96;is_monthend&#96; string COMMENT &#39;Y,N 是否是月末,用于得到本月最终结果&#39;</span><br><span class="line">  ) COMMENT &#39;活跃设备数表&#39;</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ads&#x2F;mall&#x2F;uv_count&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;ads</span><br><span class="line">table_name&#x3D;uv_count</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert into table $hive_table_name</span><br><span class="line">select</span><br><span class="line">&#39;$db_date&#39; dt,</span><br><span class="line">daycount.ct,</span><br><span class="line">wkcount.ct,</span><br><span class="line">mncount.ct,</span><br><span class="line">if(date_add(next_day(&#39;$db_date&#39;,&#39;MO&#39;),-1)&#x3D;&#39;$db_date&#39;,&#39;Y&#39;,&#39;N&#39;) ,</span><br><span class="line">if(last_day(&#39;$db_date&#39;)&#x3D;&#39;$db_date&#39;,&#39;Y&#39;,&#39;N&#39;)</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">&#39;$db_date&#39; dt,</span><br><span class="line">count(*) ct</span><br><span class="line">from dwt.mall__uv_topic</span><br><span class="line">where login_date_last&#x3D;&#39;$db_date&#39;</span><br><span class="line">)daycount join</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">&#39;$db_date&#39; dt,</span><br><span class="line">count (*) ct</span><br><span class="line">from dwt.mall__uv_topic</span><br><span class="line">where login_date_last&gt;&#x3D;date_add(next_day(&#39;$db_date&#39;,&#39;MO&#39;),-7)</span><br><span class="line">and login_date_last&lt;&#x3D; date_add(next_day(&#39;$db_date&#39;,&#39;MO&#39;),-1)</span><br><span class="line">) wkcount on daycount.dt&#x3D;wkcount.dt</span><br><span class="line">join</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">&#39;$db_date&#39; dt,</span><br><span class="line">count (*) ct</span><br><span class="line">from dwt.mall__uv_topic</span><br><span class="line">where</span><br><span class="line">date_format(login_date_last,&#39;yyyy-MM&#39;)&#x3D;date_format(&#39;$db_date&#39;,&#39;yyyy-MM&#39;)</span><br><span class="line">)mncount on daycount.dt&#x3D;mncount.dt;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="28015fa8"></a></p><h4 id="12-1-2-每日新增设备"><a href="#12-1-2-每日新增设备" class="headerlink" title="12.1.2 每日新增设备"></a>12.1.2 每日新增设备</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ads.mall__new_mid_count</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ads.mall__new_mid_count&#96;(</span><br><span class="line">&#96;create_date&#96; string comment &#39;创建时间&#39; ,</span><br><span class="line">&#96;new_mid_count&#96; bigint comment &#39;新增设备数量&#39;</span><br><span class="line">  ) COMMENT &#39;每日新增设备表&#39;</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ads&#x2F;mall&#x2F;new_mid_count&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;ads</span><br><span class="line">table_name&#x3D;new_mid_count</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert into table $hive_table_name</span><br><span class="line">select</span><br><span class="line">login_date_first,</span><br><span class="line">count(*)</span><br><span class="line">from dwt.mall__uv_topic</span><br><span class="line">where login_date_first&#x3D;&#39;$db_date&#39;</span><br><span class="line">group by login_date_first;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="cc741d26"></a></p><h4 id="12-1-3-沉默用户数"><a href="#12-1-3-沉默用户数" class="headerlink" title="12.1.3 沉默用户数"></a>12.1.3 沉默用户数</h4><blockquote><p>沉默用户：只在安装当天启动过，且启动时间是在 7 天前</p></blockquote></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ads.mall__silent_count</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ads.mall__silent_count&#96;(</span><br><span class="line">&#96;dt&#96; string COMMENT &#39;统计日期&#39;,</span><br><span class="line">&#96;silent_count&#96; bigint COMMENT &#39;沉默设备数&#39;</span><br><span class="line">  ) COMMENT &#39;沉默用户数表&#39;</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ads&#x2F;mall&#x2F;silent_count&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;ads</span><br><span class="line">table_name&#x3D;silent_count</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert into table $hive_table_name</span><br><span class="line">select</span><br><span class="line">&#39;$db_date&#39;,</span><br><span class="line">count(*)</span><br><span class="line">from dwt.mall__uv_topic</span><br><span class="line">where login_date_first&#x3D;login_date_last</span><br><span class="line">and login_date_last&lt;&#x3D;date_add(&#39;$db_date&#39;,-7);</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="1cb2adbb"></a></p><h4 id="12-1-4-本周回流用户数"><a href="#12-1-4-本周回流用户数" class="headerlink" title="12.1.4 本周回流用户数"></a>12.1.4 本周回流用户数</h4><blockquote><p>本周回流用户：上周未活跃，本周活跃的设备，且不是本周新增设备</p></blockquote></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ads.mall__back_count</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ads.mall__back_count&#96;(</span><br><span class="line">&#96;wk_dt&#96; string COMMENT &#39;统计日期所在周&#39;,</span><br><span class="line">&#96;wastage_count&#96; bigint COMMENT &#39;回流设备数&#39;</span><br><span class="line">  ) COMMENT &#39;本周回流用户数表&#39;</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ads&#x2F;mall&#x2F;back_count&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;ads</span><br><span class="line">table_name&#x3D;back_count</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert into table $hive_table_name</span><br><span class="line">select</span><br><span class="line">&#39;$db_date&#39;,</span><br><span class="line">count(*)</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">mid_id</span><br><span class="line">from dwt.mall__uv_topic</span><br><span class="line">where login_date_last&gt;&#x3D;date_add(next_day(&#39;$db_date&#39;,&#39;MO&#39;),-7)</span><br><span class="line">and login_date_last&lt;&#x3D; date_add(next_day(&#39;$db_date&#39;,&#39;MO&#39;),-1)</span><br><span class="line">and login_date_first&lt;date_add(next_day(&#39;$db_date&#39;,&#39;MO&#39;),-7)</span><br><span class="line">)current_wk</span><br><span class="line">left join</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">mid_id</span><br><span class="line">from dws.mall__uv_detail_daycount</span><br><span class="line">where dt&gt;&#x3D;date_add(next_day(&#39;$db_date&#39;,&#39;MO&#39;),-7*2)</span><br><span class="line">and dt&lt;&#x3D; date_add(next_day(&#39;$db_date&#39;,&#39;MO&#39;),-7-1)</span><br><span class="line">group by mid_id</span><br><span class="line">)last_wk</span><br><span class="line">on current_wk.mid_id&#x3D;last_wk.mid_id</span><br><span class="line">where last_wk.mid_id is null;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="30889b3f"></a></p><h4 id="12-1-5-流失用户数"><a href="#12-1-5-流失用户数" class="headerlink" title="12.1.5 流失用户数"></a>12.1.5 流失用户数</h4><blockquote><p>流失用户：最近 7 天未活跃的设备</p></blockquote></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ads.mall__wastage_count</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ads.mall__wastage_count&#96;(</span><br><span class="line">&#96;dt&#96; string COMMENT &#39;统计日期&#39;,</span><br><span class="line">&#96;wastage_count&#96; bigint COMMENT &#39;流失设备数&#39;</span><br><span class="line">  ) COMMENT &#39;流失用户数表&#39;</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ads&#x2F;mall&#x2F;wastage_count&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;ads</span><br><span class="line">table_name&#x3D;wastage_count</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert into table $hive_table_name</span><br><span class="line">select</span><br><span class="line">&#39;$db_date&#39;,</span><br><span class="line">count(*)</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">mid_id</span><br><span class="line">from dwt.mall__uv_topic</span><br><span class="line">where login_date_last&lt;&#x3D;date_add(&#39;$db_date&#39;,-7)</span><br><span class="line">group by mid_id</span><br><span class="line">)t1;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="f3ba6663"></a></p><h4 id="12-1-6-留存率"><a href="#12-1-6-留存率" class="headerlink" title="12.1.6 留存率"></a>12.1.6 留存率</h4><p><img src="https://cdn.nlark.com/yuque/0/2020/png/1072113/1596445445739-01d3c47c-ce57-475d-9b12-238c669b7f15.png#align=left&display=inline&height=412&margin=%5Bobject%20Object%5D&originHeight=412&originWidth=963&size=0&status=done&style=none&width=963" alt></p></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ads.mall__user_retention_day_rate</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ads.mall__user_retention_day_rate&#96;(</span><br><span class="line">&#96;stat_date&#96; string comment &#39;统计日期&#39;,</span><br><span class="line">&#96;create_date&#96; string comment &#39;设备新增日期&#39;,</span><br><span class="line">&#96;retention_day&#96; int comment &#39;截止当前日期留存天数&#39;,</span><br><span class="line">&#96;retention_count&#96; bigint comment &#39;留存数量&#39;,</span><br><span class="line">&#96;new_mid_count&#96; bigint comment &#39;设备新增数量&#39;,</span><br><span class="line">&#96;retention_ratio&#96; decimal(10,2) comment &#39;留存率&#39;</span><br><span class="line">  ) COMMENT &#39;留存率表&#39;</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ads&#x2F;mall&#x2F;user_retention_day_rate&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;ads</span><br><span class="line">table_name&#x3D;user_retention_day_rate</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert into table $hive_table_name</span><br><span class="line">select</span><br><span class="line">&#39;$db_date&#39;,--统计日期</span><br><span class="line">date_add(&#39;$db_date&#39;,-1),--新增日期</span><br><span class="line">1,--留存天数</span><br><span class="line">sum(if(login_date_first&#x3D;date_add(&#39;$db_date&#39;,-1) and</span><br><span class="line">login_date_last&#x3D;&#39;$db_date&#39;,1,0)),--2020-03-09 的 1 日留存数</span><br><span class="line">sum(if(login_date_first&#x3D;date_add(&#39;$db_date&#39;,-1),1,0)),--2020-03-09 新增</span><br><span class="line">sum(if(login_date_first&#x3D;date_add(&#39;$db_date&#39;,-1) and</span><br><span class="line">login_date_last&#x3D;&#39;$db_date&#39;,1,0))&#x2F;sum(if(login_date_first&#x3D;date_add(&#39;$db_date&#39;,-1),1,0))*100</span><br><span class="line">from dwt.mall__uv_topic</span><br><span class="line">union all</span><br><span class="line">select</span><br><span class="line">&#39;$db_date&#39;,--统计日期</span><br><span class="line">date_add(&#39;$db_date&#39;,-2),--新增日期</span><br><span class="line">2,--留存天数</span><br><span class="line">sum(if(login_date_first&#x3D;date_add(&#39;$db_date&#39;,-2) and</span><br><span class="line">login_date_last&#x3D;&#39;$db_date&#39;,1,0)),--2020-03-08 的 2 日留存数</span><br><span class="line">sum(if(login_date_first&#x3D;date_add(&#39;$db_date&#39;,-2),1,0)),--2020-03-08 新增</span><br><span class="line">sum(if(login_date_first&#x3D;date_add(&#39;$db_date&#39;,-2) and</span><br><span class="line">login_date_last&#x3D;&#39;$db_date&#39;,1,0))&#x2F;sum(if(login_date_first&#x3D;date_add(&#39;$db_date&#39;,-2),1,0))*100</span><br><span class="line">from dwt.mall__uv_topic</span><br><span class="line">union all</span><br><span class="line">select</span><br><span class="line">&#39;$db_date&#39;,--统计日期</span><br><span class="line">date_add(&#39;$db_date&#39;,-3),--新增日期</span><br><span class="line">3,--留存天数</span><br><span class="line">sum(if(login_date_first&#x3D;date_add(&#39;$db_date&#39;,-3) and</span><br><span class="line">login_date_last&#x3D;&#39;$db_date&#39;,1,0)),--2020-03-07 的 3 日留存数</span><br><span class="line">sum(if(login_date_first&#x3D;date_add(&#39;$db_date&#39;,-3),1,0)),--2020-03-07 新增</span><br><span class="line">sum(if(login_date_first&#x3D;date_add(&#39;$db_date&#39;,-3) and</span><br><span class="line">login_date_last&#x3D;&#39;$db_date&#39;,1,0))&#x2F;sum(if(login_date_first&#x3D;date_add(&#39;$db_date&#39;,-3),1,0))*100</span><br><span class="line">from dwt.mall__uv_topic;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="06478dde"></a></p><h4 id="12-1-7-最近连续三周活跃用户数"><a href="#12-1-7-最近连续三周活跃用户数" class="headerlink" title="12.1.7 最近连续三周活跃用户数"></a>12.1.7 最近连续三周活跃用户数</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ads.mall__continuity_wk_count</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ads.mall__continuity_wk_count&#96;(</span><br><span class="line">&#96;dt&#96; string COMMENT &#39;统计日期,一般用结束周周日日期,如果每天计算一次,可用当天日期&#39;,</span><br><span class="line">&#96;wk_dt&#96; string COMMENT &#39;持续时间&#39;,</span><br><span class="line">&#96;continuity_count&#96; bigint COMMENT &#39;活跃次数&#39;</span><br><span class="line">  ) COMMENT &#39;最近连续三周活跃用户数表&#39;</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ads&#x2F;mall&#x2F;continuity_wk_count&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;ads</span><br><span class="line">table_name&#x3D;continuity_wk_count</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert into table $hive_table_name</span><br><span class="line">select</span><br><span class="line">&#39;$db_date&#39;,</span><br><span class="line">concat(date_add(next_day(&#39;$db_date&#39;,&#39;MO&#39;),-7*3),&#39;_&#39;,date_add(next_day(&#39;$db_date&#39;,&#39;MO&#39;),-1)),</span><br><span class="line">count(*)</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">mid_id</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">mid_id</span><br><span class="line">from dws.mall__uv_detail_daycount</span><br><span class="line">where dt&gt;&#x3D;date_add(next_day(&#39;$db_date&#39;,&#39;monday&#39;),-7)</span><br><span class="line">and dt&lt;&#x3D;date_add(next_day(&#39;$db_date&#39;,&#39;monday&#39;),-1)</span><br><span class="line">group by mid_id</span><br><span class="line">union all</span><br><span class="line">select</span><br><span class="line">mid_id</span><br><span class="line">from dws.mall__uv_detail_daycount</span><br><span class="line">where dt&gt;&#x3D;date_add(next_day(&#39;$db_date&#39;,&#39;monday&#39;),-7*2)</span><br><span class="line">and dt&lt;&#x3D;date_add(next_day(&#39;$db_date&#39;,&#39;monday&#39;),-7-1)</span><br><span class="line">group by mid_id</span><br><span class="line">union all</span><br><span class="line">select</span><br><span class="line">mid_id</span><br><span class="line">from dws.mall__uv_detail_daycount</span><br><span class="line">where dt&gt;&#x3D;date_add(next_day(&#39;$db_date&#39;,&#39;monday&#39;),-7*3)</span><br><span class="line">and dt&lt;&#x3D;date_add(next_day(&#39;$db_date&#39;,&#39;monday&#39;),-7*2-1)</span><br><span class="line">group by mid_id</span><br><span class="line">)t1</span><br><span class="line">group by mid_id</span><br><span class="line">having count(*)&#x3D;3</span><br><span class="line">)t2</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="34c5f2dd"></a></p><h4 id="12-1-8-最近七天内连续三天活跃用户数"><a href="#12-1-8-最近七天内连续三天活跃用户数" class="headerlink" title="12.1.8 最近七天内连续三天活跃用户数"></a>12.1.8 最近七天内连续三天活跃用户数</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ads.mall__continuity_uv_count</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ads.mall__continuity_uv_count&#96;(</span><br><span class="line">&#96;dt&#96; string COMMENT &#39;统计日期&#39;,</span><br><span class="line">&#96;wk_dt&#96; string COMMENT &#39;最近 7 天日期&#39;,</span><br><span class="line">&#96;continuity_count&#96; bigint</span><br><span class="line">  ) COMMENT &#39;最近七天内连续三天活跃用户数表&#39;</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ads&#x2F;mall&#x2F;continuity_uv_count&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;ads</span><br><span class="line">table_name&#x3D;continuity_uv_count</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert into table $hive_table_name</span><br><span class="line">select</span><br><span class="line">&#39;$db_date&#39;,</span><br><span class="line">concat(date_add(&#39;db_date&#39;,-6),&#39;_&#39;,&#39;db_date&#39;),</span><br><span class="line">count(*)</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select </span><br><span class="line">mid_id</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select </span><br><span class="line">mid_id</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">mid_id,</span><br><span class="line">date_sub(dt,rank) date_dif</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">mid_id,</span><br><span class="line">dt,</span><br><span class="line">rank() over(partition by mid_id order by dt) rank</span><br><span class="line">from dws.mall__uv_detail_daycount</span><br><span class="line">where dt&gt;&#x3D;date_add(&#39;db_date&#39;,-6) and</span><br><span class="line">dt&lt;&#x3D;&#39;db_date&#39;</span><br><span class="line">)t1</span><br><span class="line">)t2</span><br><span class="line">group by mid_id,date_dif</span><br><span class="line">having count(*)&gt;&#x3D;3</span><br><span class="line">)t3</span><br><span class="line">group by mid_id</span><br><span class="line">)t4;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="blogTitle52"></a></p><h3 id="12-2-会员主题"><a href="#12-2-会员主题" class="headerlink" title="12.2 会员主题"></a>12.2 会员主题</h3><p><a name="497e0770"></a></p><h4 id="12-2-1-会员主题信息"><a href="#12-2-1-会员主题信息" class="headerlink" title="12.2.1 会员主题信息"></a>12.2.1 会员主题信息</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ads.mall__user_topic</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ads.mall__user_topic&#96;(</span><br><span class="line">&#96;dt&#96; string COMMENT &#39;统计日期&#39;,</span><br><span class="line">&#96;day_users&#96; string COMMENT &#39;活跃会员数&#39;,</span><br><span class="line">&#96;day_new_users&#96; string COMMENT &#39;新增会员数&#39;,</span><br><span class="line">&#96;day_new_payment_users&#96; string COMMENT &#39;新增消费会员数&#39;,</span><br><span class="line">&#96;payment_users&#96; string COMMENT &#39;总付费会员数&#39;,</span><br><span class="line">&#96;users&#96; string COMMENT &#39;总会员数&#39;,</span><br><span class="line">&#96;day_users2users&#96; decimal(10,2) COMMENT &#39;会员活跃率&#39;,</span><br><span class="line">&#96;payment_users2users&#96; decimal(10,2) COMMENT &#39;会员付费率&#39;,</span><br><span class="line">&#96;day_new_users2users&#96; decimal(10,2) COMMENT &#39;会员新鲜度&#39;</span><br><span class="line">  ) COMMENT &#39;会员主题信息表&#39;</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ads&#x2F;mall&#x2F;user_topic&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;ads</span><br><span class="line">table_name&#x3D;user_topic</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert into table $hive_table_name</span><br><span class="line">select</span><br><span class="line">&#39;$db_date&#39;,</span><br><span class="line">sum(if(login_date_last&#x3D;&#39;$db_date&#39;,1,0)),</span><br><span class="line">sum(if(login_date_first&#x3D;&#39;$db_date&#39;,1,0)),</span><br><span class="line">sum(if(payment_date_first&#x3D;&#39;$db_date&#39;,1,0)),</span><br><span class="line">sum(if(payment_count&gt;0,1,0)),</span><br><span class="line">count(*),</span><br><span class="line">sum(if(login_date_last&#x3D;&#39;$db_date&#39;,1,0))&#x2F;count(*),</span><br><span class="line">sum(if(payment_count&gt;0,1,0))&#x2F;count(*),</span><br><span class="line">sum(if(login_date_first&#x3D;&#39;$db_date&#39;,1,0))&#x2F;sum(if(login_date_last&#x3D;&#39;$db_date&#39;,1,0))</span><br><span class="line">from dwt.mall__user_topic</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="c44b12bf"></a></p><h4 id="12-2-2-漏斗分析"><a href="#12-2-2-漏斗分析" class="headerlink" title="12.2.2 漏斗分析"></a>12.2.2 漏斗分析</h4><blockquote><p>统计“浏览-&gt;购物车-&gt;下单-&gt;支付”的转化率<br>思路：统计各个行为的人数，然后计算比值。</p></blockquote></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ads.mall__user_action_convert_day</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ads.mall__user_action_convert_day&#96;(</span><br><span class="line">&#96;dt&#96; string COMMENT &#39;统计日期&#39;,</span><br><span class="line">&#96;total_visitor_m_count&#96; bigint COMMENT &#39;总访问人数&#39;,</span><br><span class="line">&#96;cart_u_count&#96; bigint COMMENT &#39;加入购物车的人数&#39;,</span><br><span class="line">&#96;visitor2cart_convert_ratio&#96; decimal(10,2) COMMENT &#39;访问到加入购物车转化率&#39;,</span><br><span class="line">&#96;order_u_count&#96; bigint COMMENT &#39;下单人数&#39;,</span><br><span class="line">&#96;cart2order_convert_ratio&#96; decimal(10,2) COMMENT &#39;加入购物车到下单转化率&#39;,</span><br><span class="line">&#96;payment_u_count&#96; bigint COMMENT &#39;支付人数&#39;,</span><br><span class="line">&#96;order2payment_convert_ratio&#96; decimal(10,2) COMMENT &#39;下单到支付的转化率&#39;</span><br><span class="line">  ) COMMENT &#39;漏斗分析表&#39;</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ads&#x2F;mall&#x2F;user_action_convert_day&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;ads</span><br><span class="line">table_name&#x3D;user_action_convert_day</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert into table $hive_table_name</span><br><span class="line">select</span><br><span class="line">&#39;$db_date&#39;,</span><br><span class="line">uv.day_count,</span><br><span class="line">ua.cart_count,</span><br><span class="line">cast(ua.cart_count&#x2F;uv.day_count as decimal(10,2)) visitor2cart_convert_ratio,</span><br><span class="line">ua.order_count,</span><br><span class="line">cast(ua.order_count&#x2F;ua.cart_count as decimal(10,2)) visitor2order_convert_ratio,</span><br><span class="line">ua.payment_count,</span><br><span class="line">cast(ua.payment_count&#x2F;ua.order_count as decimal(10,2)) order2payment_convert_ratio</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">dt,</span><br><span class="line">sum(if(cart_count&gt;0,1,0)) cart_count,</span><br><span class="line">sum(if(order_count&gt;0,1,0)) order_count,</span><br><span class="line">sum(if(payment_count&gt;0,1,0)) payment_count</span><br><span class="line">from dws.mall__user_action_daycount</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">group by dt</span><br><span class="line">)ua join ads.mall__uv_count uv on uv.dt&#x3D;ua.dt;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="blogTitle53"></a></p><h3 id="12-3-商品主题"><a href="#12-3-商品主题" class="headerlink" title="12.3 商品主题"></a>12.3 商品主题</h3><p><a name="732a2a9a"></a></p><h4 id="12-3-1-商品个数信息"><a href="#12-3-1-商品个数信息" class="headerlink" title="12.3.1 商品个数信息"></a>12.3.1 商品个数信息</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ads.mall__product_info</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ads.mall__product_info&#96;(</span><br><span class="line">&#96;dt&#96; string COMMENT &#39;统计日期&#39;,</span><br><span class="line">&#96;sku_num&#96; string COMMENT &#39;sku 个数&#39;,</span><br><span class="line">&#96;spu_num&#96; string COMMENT &#39;spu 个数&#39;</span><br><span class="line">  ) COMMENT &#39;商品个数信息表&#39;</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ads&#x2F;mall&#x2F;product_info&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;ads</span><br><span class="line">table_name&#x3D;product_info</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert into table $hive_table_name</span><br><span class="line">select</span><br><span class="line">&#39;$db_date&#39; dt,</span><br><span class="line">sku_num,</span><br><span class="line">spu_num</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">&#39;$db_date&#39; dt,</span><br><span class="line">count(*) sku_num</span><br><span class="line">from</span><br><span class="line">dwt.mall__sku_topic</span><br><span class="line">) tmp_sku_num</span><br><span class="line">join</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">&#39;$db_date&#39; dt,</span><br><span class="line">count(*) spu_num</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">spu_id</span><br><span class="line">from</span><br><span class="line">dwt.mall__sku_topic</span><br><span class="line">group by</span><br><span class="line">spu_id</span><br><span class="line">) tmp_spu_id</span><br><span class="line">) tmp_spu_num</span><br><span class="line">on</span><br><span class="line">tmp_sku_num.dt&#x3D;tmp_spu_num.dt;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="be51cd91"></a></p><h4 id="12-3-2-商品销量排行"><a href="#12-3-2-商品销量排行" class="headerlink" title="12.3.2 商品销量排行"></a>12.3.2 商品销量排行</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ads.mall__product_sale_topN</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ads.mall__product_sale_topN&#96;(</span><br><span class="line">&#96;dt&#96; string COMMENT &#39;统计日期&#39;,</span><br><span class="line">&#96;sku_num&#96; string COMMENT &#39;sku 个数&#39;,</span><br><span class="line">&#96;spu_num&#96; string COMMENT &#39;spu 个数&#39;</span><br><span class="line">  ) COMMENT &#39;商品销量排名表&#39;</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ads&#x2F;mall&#x2F;product_sale_topN&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;ads</span><br><span class="line">table_name&#x3D;product_sale_topN</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert into table $hive_table_name</span><br><span class="line">select</span><br><span class="line">&#39;$db_date&#39; dt,</span><br><span class="line">sku_id,</span><br><span class="line">payment_amount</span><br><span class="line">from</span><br><span class="line">dws.mall__sku_action_daycount</span><br><span class="line">where</span><br><span class="line">dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">order by payment_amount desc</span><br><span class="line">limit 10;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="06076ff3"></a></p><h4 id="12-3-3-商品收藏排名"><a href="#12-3-3-商品收藏排名" class="headerlink" title="12.3.3 商品收藏排名"></a>12.3.3 商品收藏排名</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ads.mall__product_favor_topN</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ads.mall__product_favor_topN&#96;(</span><br><span class="line">&#96;dt&#96; string COMMENT &#39;统计日期&#39;,</span><br><span class="line">&#96;sku_id&#96; string COMMENT &#39;商品 ID&#39;,</span><br><span class="line">&#96;favor_count&#96; bigint COMMENT &#39;收藏量&#39;</span><br><span class="line">  ) COMMENT &#39;商品收藏排名表&#39;</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ads&#x2F;mall&#x2F;product_favor_topN&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;ads</span><br><span class="line">table_name&#x3D;product_favor_topN</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert into table $hive_table_name</span><br><span class="line">select</span><br><span class="line">&#39;$db_date&#39; dt,</span><br><span class="line">sku_id,</span><br><span class="line">favor_count</span><br><span class="line">from</span><br><span class="line">dws.mall__sku_action_daycount</span><br><span class="line">where</span><br><span class="line">dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">order by favor_count desc</span><br><span class="line">limit 10;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="3a119480"></a></p><h4 id="12-3-4-商品加入购物车排名"><a href="#12-3-4-商品加入购物车排名" class="headerlink" title="12.3.4 商品加入购物车排名"></a>12.3.4 商品加入购物车排名</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ads.mall__product_cart_topN</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ads.mall__product_cart_topN&#96;(</span><br><span class="line">&#96;dt&#96; string COMMENT &#39;统计日期&#39;,</span><br><span class="line">&#96;sku_id&#96; string COMMENT &#39;商品 ID&#39;,</span><br><span class="line">&#96;cart_num&#96; bigint COMMENT &#39;加入购物车数量&#39;</span><br><span class="line">  ) COMMENT &#39;商品加入购物车排名表&#39;</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ads&#x2F;mall&#x2F;product_cart_topN&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;ads</span><br><span class="line">table_name&#x3D;product_cart_topN</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert into table $hive_table_name</span><br><span class="line">select</span><br><span class="line">&#39;$db_date&#39; dt,</span><br><span class="line">sku_id,</span><br><span class="line">cart_num</span><br><span class="line">from</span><br><span class="line">dws.mall__sku_action_daycount</span><br><span class="line">where</span><br><span class="line">dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">order by cart_num desc</span><br><span class="line">limit 10;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="1f46c297"></a></p><h4 id="12-3-5-商品退款率排名（近30天）"><a href="#12-3-5-商品退款率排名（近30天）" class="headerlink" title="12.3.5 商品退款率排名（近30天）"></a>12.3.5 商品退款率排名（近30天）</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ads.mall__product_refund_topN</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ads.mall__product_refund_topN&#96;(</span><br><span class="line">&#96;dt&#96; string COMMENT &#39;统计日期&#39;,</span><br><span class="line">&#96;sku_id&#96; string COMMENT &#39;商品 ID&#39;,</span><br><span class="line">&#96;refund_ratio&#96; decimal(10,2) COMMENT &#39;退款率&#39;</span><br><span class="line">  ) COMMENT &#39;商品退款率排名(最近 30 天)表&#39;</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ads&#x2F;mall&#x2F;product_refund_topN&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;ads</span><br><span class="line">table_name&#x3D;product_refund_topN</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert into table $hive_table_name</span><br><span class="line">select</span><br><span class="line">&#39;$db_date&#39;,</span><br><span class="line">sku_id,</span><br><span class="line">refund_last_30d_count&#x2F;payment_last_30d_count*100 refund_ratio</span><br><span class="line">from dwt.mall__sku_topic</span><br><span class="line">order by refund_ratio desc</span><br><span class="line">limit 10;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="e3f57ae4"></a></p><h4 id="12-3-6-商品差评率"><a href="#12-3-6-商品差评率" class="headerlink" title="12.3.6 商品差评率"></a>12.3.6 商品差评率</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ads.mall__appraise_bad_topN</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ads.mall__appraise_bad_topN&#96;(</span><br><span class="line">&#96;dt&#96; string COMMENT &#39;统计日期&#39;,</span><br><span class="line">&#96;sku_id&#96; string COMMENT &#39;商品 ID&#39;,</span><br><span class="line">&#96;appraise_bad_ratio&#96; decimal(10,2) COMMENT &#39;差评率&#39;</span><br><span class="line">  ) COMMENT &#39;商品差评率表&#39;</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ads&#x2F;mall&#x2F;appraise_bad_topN&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;ads</span><br><span class="line">table_name&#x3D;appraise_bad_topN</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert into table $hive_table_name</span><br><span class="line">select</span><br><span class="line">&#39;$db_date&#39; dt,</span><br><span class="line">sku_id,</span><br><span class="line">appraise_bad_count&#x2F;(appraise_good_count+appraise_mid_count+appraise_bad_count+appraise_default_count) appraise_bad_ratio</span><br><span class="line">from</span><br><span class="line">dws.mall__sku_action_daycount</span><br><span class="line">where</span><br><span class="line">dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">order by appraise_bad_ratio desc</span><br><span class="line">limit 10;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="blogTitle54"></a></p><h3 id="12-4-营销主题"><a href="#12-4-营销主题" class="headerlink" title="12.4 营销主题"></a>12.4 营销主题</h3><p><a name="8caa923f"></a></p><h4 id="12-4-1-下单数目统计"><a href="#12-4-1-下单数目统计" class="headerlink" title="12.4.1 下单数目统计"></a>12.4.1 下单数目统计</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ads.mall__order_daycount</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ads.mall__order_daycount&#96;(</span><br><span class="line">dt string comment &#39;统计日期&#39;,</span><br><span class="line">order_count bigint comment &#39;单日下单笔数&#39;,</span><br><span class="line">order_amount bigint comment &#39;单日下单金额&#39;,</span><br><span class="line">order_users bigint comment &#39;单日下单用户数&#39;</span><br><span class="line">  ) COMMENT &#39;下单数目统计表&#39;</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ads&#x2F;mall&#x2F;order_daycount&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;ads</span><br><span class="line">table_name&#x3D;order_daycount</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert into table $hive_table_name</span><br><span class="line">select</span><br><span class="line">&#39;$db_date&#39;,</span><br><span class="line">sum(order_count),</span><br><span class="line">sum(order_amount),</span><br><span class="line">sum(if(order_count&gt;0,1,0))</span><br><span class="line">from dws.mall__user_action_daycount</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="4e99b63e"></a></p><h4 id="12-4-2-支付信息统计"><a href="#12-4-2-支付信息统计" class="headerlink" title="12.4.2 支付信息统计"></a>12.4.2 支付信息统计</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ads.mall__payment_daycount</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ads.mall__payment_daycount&#96;(</span><br><span class="line">dt string comment &#39;统计日期&#39;,</span><br><span class="line">order_count bigint comment &#39;单日支付笔数&#39;,</span><br><span class="line">order_amount bigint comment &#39;单日支付金额&#39;,</span><br><span class="line">payment_user_count bigint comment &#39;单日支付人数&#39;,</span><br><span class="line">payment_sku_count bigint comment &#39;单日支付商品数&#39;,</span><br><span class="line">payment_avg_time double comment &#39;下单到支付的平均时长，取分钟数&#39;</span><br><span class="line">  ) COMMENT &#39;支付信息统计表&#39;</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ads&#x2F;mall&#x2F;payment_daycount&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;ads</span><br><span class="line">table_name&#x3D;payment_daycount</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert into table $hive_table_name</span><br><span class="line">select</span><br><span class="line">tmp_payment.dt,</span><br><span class="line">tmp_payment.payment_count,</span><br><span class="line">tmp_payment.payment_amount,</span><br><span class="line">tmp_payment.payment_user_count,</span><br><span class="line">tmp_skucount.payment_sku_count,</span><br><span class="line">tmp_time.payment_avg_time</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">&#39;$db_date&#39; dt,</span><br><span class="line">sum(payment_count) payment_count,</span><br><span class="line">sum(payment_amount) payment_amount,</span><br><span class="line">sum(if(payment_count&gt;0,1,0)) payment_user_count</span><br><span class="line">from dws.mall__user_action_daycount</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">)tmp_payment</span><br><span class="line">join</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">&#39;$db_date&#39; dt,</span><br><span class="line">sum(if(payment_count&gt;0,1,0)) payment_sku_count</span><br><span class="line">from dws.mall__sku_action_daycount</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">)tmp_skucount on tmp_payment.dt&#x3D;tmp_skucount.dt</span><br><span class="line">join</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">&#39;$db_date&#39; dt,</span><br><span class="line">sum(unix_timestamp(payment_time)-unix_timestamp(create_time))&#x2F;count(*)&#x2F;60</span><br><span class="line">payment_avg_time</span><br><span class="line">from dwd.mall__fact_order_info</span><br><span class="line">where dt&#x3D;&#39;$db_date&#39;</span><br><span class="line">and payment_time is not null</span><br><span class="line">)tmp_time on tmp_payment.dt&#x3D;tmp_time.dt</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure><p><a name="c20ec38b"></a></p><h4 id="12-4-3-复购率"><a href="#12-4-3-复购率" class="headerlink" title="12.4.3 复购率"></a>12.4.3 复购率</h4></li><li><p>建表</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line">drop table if exists ads.mall__sale_tm_category1_stat_mn</span><br><span class="line">CREATE EXTERNAL TABLE &#96;ads.mall__sale_tm_category1_stat_mn&#96;(</span><br><span class="line">tm_id string comment &#39;品牌 id&#39;,</span><br><span class="line">category1_id string comment &#39;1 级品类 id &#39;,</span><br><span class="line">category1_name string comment &#39;1 级品类名称 &#39;,</span><br><span class="line">buycount bigint comment &#39;购买人数&#39;,</span><br><span class="line">buy_twice_last bigint comment &#39;两次以上购买人数&#39;,</span><br><span class="line">buy_twice_last_ratio decimal(10,2) comment &#39;单次复购率&#39;,</span><br><span class="line">buy_3times_last bigint comment &#39;三次以上购买人数&#39;,</span><br><span class="line">buy_3times_last_ratio decimal(10,2) comment &#39;多次复购率&#39;,</span><br><span class="line">stat_mn string comment &#39;统计月份&#39;,</span><br><span class="line">stat_date string comment &#39;统计日期&#39;</span><br><span class="line">  ) COMMENT &#39;复购率表&#39;</span><br><span class="line">row format delimited fields terminated by &#39;\t&#39;</span><br><span class="line">stored as parquet</span><br><span class="line">location &#39;&#x2F;warehouse&#x2F;ads&#x2F;mall&#x2F;sale_tm_category1_stat_mn&#x2F;&#39;</span><br><span class="line">tblproperties (&quot;parquet.compression&quot;&#x3D;&quot;snappy&quot;)</span><br></pre></td></tr></table></figure></li><li><p>导入数据</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br></pre></td><td class="code"><pre><span class="line">#!&#x2F;bin&#x2F;bash</span><br><span class="line">db_date&#x3D;$&#123;date&#125;</span><br><span class="line">hive&#x3D;&#x2F;opt&#x2F;cloudera&#x2F;parcels&#x2F;CDH-6.2.0-1.cdh6.2.0.p0.967373&#x2F;bin&#x2F;hive</span><br><span class="line">APP1&#x3D;mall</span><br><span class="line">APP2&#x3D;ads</span><br><span class="line">table_name&#x3D;sale_tm_category1_stat_mn</span><br><span class="line">hive_table_name&#x3D;$APP2.mall__$table_name</span><br><span class="line"># 如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天</span><br><span class="line">if [ -n &quot;$&#123;date&#125;&quot; ] ;then</span><br><span class="line">        db_date&#x3D;$&#123;date&#125;</span><br><span class="line">else </span><br><span class="line">        db_date&#x3D;&#96;date -d &quot;-1 day&quot; +%F&#96;</span><br><span class="line">fi</span><br><span class="line">sql&#x3D;&quot; </span><br><span class="line">insert into table $hive_table_name</span><br><span class="line">select</span><br><span class="line">mn.sku_tm_id,</span><br><span class="line">mn.sku_category1_id,</span><br><span class="line">mn.sku_category1_name,</span><br><span class="line">sum(if(mn.order_count&gt;&#x3D;1,1,0)) buycount,</span><br><span class="line">sum(if(mn.order_count&gt;&#x3D;2,1,0)) buyTwiceLast,</span><br><span class="line">sum(if(mn.order_count&gt;&#x3D;2,1,0))&#x2F;sum( if(mn.order_count&gt;&#x3D;1,1,0)) buyTwiceLastRatio,</span><br><span class="line">sum(if(mn.order_count&gt;&#x3D;3,1,0)) buy3timeLast ,</span><br><span class="line">sum(if(mn.order_count&gt;&#x3D;3,1,0))&#x2F;sum( if(mn.order_count&gt;&#x3D;1,1,0)) buy3timeLastRatio,</span><br><span class="line">date_format(&#39;$db_date&#39; ,&#39;yyyy-MM&#39;) stat_mn,</span><br><span class="line">&#39;$db_date&#39; stat_date</span><br><span class="line">from</span><br><span class="line">(</span><br><span class="line">select</span><br><span class="line">user_id,</span><br><span class="line">sd.sku_tm_id,</span><br><span class="line">sd.sku_category1_id,</span><br><span class="line">sd.sku_category1_name,</span><br><span class="line">sum(order_count) order_count</span><br><span class="line">from dws.mall__sale_detail_daycount sd</span><br><span class="line">where date_format(dt,&#39;yyyy-MM&#39;)&#x3D;date_format(&#39;$db_date&#39; ,&#39;yyyy-MM&#39;)</span><br><span class="line">group by user_id, sd.sku_tm_id, sd.sku_category1_id, sd.sku_category1_name</span><br><span class="line">) mn</span><br><span class="line">group by mn.sku_tm_id, mn.sku_category1_id, mn.sku_category1_name;</span><br><span class="line">&quot;</span><br><span class="line">$hive -e &quot;$sql&quot;</span><br></pre></td></tr></table></figure></li></ul><!-- rebuild by neat -->]]></content>
    
    <summary type="html">
    
      
      
        &lt;!-- build time:Tue Jan 12 2021 23:56:18 GMT+0800 (GMT+08:00) --&gt;&lt;p&gt;以下这篇博客转载自&lt;a href=&quot;https://www.cnblogs.com/ttzzyy/p/13255841.html&quot; target
      
    
    </summary>
    
    
      <category term="大数据" scheme="cpeixin.cn/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/"/>
    
    
      <category term="数据仓库" scheme="cpeixin.cn/tags/%E6%95%B0%E6%8D%AE%E4%BB%93%E5%BA%93/"/>
    
  </entry>
  
</feed>
