SEARCH
TOOLBOX
LANGUAGES
Create a book
Podservice/Podcasting by the numbers

Podservice/Podcasting by the numbers

From Steeple

Jump to: navigation, search


Contents

[edit] 1 Podcasting by the numbers

This section is an overview our current and anticipated usage levels in terms of data storage and movement. Our experiences have shown how we quickly exceeded our limited initial resources in the drive to deliver the iTunes U launch and embryonic Podcasting Pilot. The Pilot Project Usage Levels section is more relevant at this time, outlining our current and near future usage levels as based on recent experience.

[edit] 1.1 Past Experiences

This next selection of data comes from a summary of notes collected from testing and during experience of delivering the new iTunes U portal.

[edit] 1.1.1 Media/Data files - Storage

These examples refer to video content and are slightly conservative estimates.


Package/Stage Contents/Format Average size per hour of content
(1) Original Material Depends on source, e.g.: Video tape, Media Unit, screengrab, format 50 Gb
(2) Working Files E.g. Final Cut project files, test encodings of rushes, clips of original material
(3) Master Copy Video in DV/MPEG2 format.Audio in AIF/WAV format 12 Gb (video), 1 Gb (audio)
(4) Master Encodings Compressed versions of the Master Copy files. E.g. 3 levels of MPEG4/H264, 2 levels of audio only MP3, 1 form of high quality audio (AIF) 2 Gb (total)

Audio only projects consume much less space, and initially comprise a greater percentage of Oxford podcasts. Stages 1 & 2 would typically result in 2Gb of data, and the Master encodings (4) about 1Gb of data.

[edit] 1.1.1.1 Scaled up examples

Average project content length~ 1hr-> Average project content size (video)~ 70Gb-> Average project content size (audio)~ 4Gb

Approximately 1 video project produced a week~ 40 projects per year-> Annual storage increase required~ 2800Gb == 2.8Tb

Approximately 2 audio projects produced a week~ 80 projects per year-> Annual storage increase required~ 320Gb

We need to stress that these are average figures - some input supplied by departments can be much larger (e.g. RAW Uncompressed Video at HD = 240Gb per hour or 66.6 Mb per second). These files can be (and are) reworked over time and sometimes form additional material for later projects. The anticipated lifetime for “active” projects (i.e. material that is likely to be accessed and reused) is 18 months. Material that isn’t affected in this period is then suitable for archive/backup/deletion.

[edit] 1.1.2 Current setup and numbers

The initial 6 months surrounding the iTunes U launch (Jun-Dec ‘08) has resulted in 800Gb of material remaining, as well as 50Gb of supporting material (workflow packages, transition files, trailer edits, master settings, etc). Around 600Gb of data has been deleted or downsized (affects reuse negatively) during this time due to lack of currently available storage space.

Presently storage of all material in the working process (though not the video processing store) is held on a single external firewire drive based in an OUCS office and accessed via the office network. This has proven barely tolerable and imposes other limitations (such as the extra time taken to copy data to and from the drive to local systems for use) and great risks (this system is not successfully backing up, nor is there any duplication of data elsewhere, and is a single disk in frequent use). Encoding and processing of this data is done on a separate (dedicated) machine which requires the Master Edit data to be transferred across the network to its own internal storage – indeed it can often take longer to upload the files for processing than it takes to process the data.

It is important to note that the publically available (i.e. published) files are held on a separate storage solution within the Sysdev team and that hosting is of a standard suited to the task. More will be detailed on that in a later section.

[edit] 1.1.3 Transfer rates

The current process is very inefficient due to the limited equipment available, and this does exacerbate problems related to data transfer. This section outlines current findings and details the transfer requirements needed in a more practical solution.

Given the size and quantities of files we’re working with and the multiple processing locations, these values have a significant cumulative effect in workflows, adding hours of deadtime to the time taken to produce a project.

Transfer Theoretical Rate (1) Tested Rate (2)
External Hardrive to External Hardrive (e.g. Firewire to Firewire on a mini-mac) SATA 300 + FW800 – Limit FW. 80% Limit =282Gb/h 60Gb/h(17% Capacity)
External Hardrive to Machine copy. (e.g. Firewire to Mini-mac internal) SATA 300 + FW800 – Limit FW. 80% Limit =282Gb/h 127Gb/h(36% Capacity (3)


Internal File copy (e.g. Duplicate a local file) SATA 300. 80% Limit =860Gb/h 72Gb/h (6% Capacity)

(1) Rates have been taken based on interface specifications, and then 80% calculated to simulate a more conservative/realistic estimate.

(2) Tested Rates have been performed in a series of repeated trials involving a typical 18Gb file on our available hardware.

(3) The Lacie external 1TB drive is actually two 500Gb drives in a RAID 0 configuration, which clearly boosts read speed in these trials. However, it also increases the risk of catastrophic data loss.

Copy to Networked File Store over 100Mbps Ethernet LAN (e.g. Office machine to LTGShare) Likely same for any desktop machine to machine style copying via this network. Limit LAN. 80% Limit =45Gb/h 38Gb/h(67% Capacity)
Copy to Networked File Store in OUCS Machine Room from a department with good networking (e.g. departmental server to OUCS server/podcaster2 on Gigabit networking) Limit LAN. 80% Limit =450Gb/h No means to test at this stage
Local Machine to Media.podcasts over sftp Limit LAN. 80% Limit =45Gb/h 28 Gb/h(50% Capacity)
Local Machine to XServe (4) Limit LAN. 25Gb/h (5)

(4) This is based on an average from network usage graphs over a 12-hour period of uploading.

(5) Suspect large overhead on transfer protocol method here.


It is also important to note that these tests were done during low usage periods (out-of-office hours) and typically with only one access at a time in progress, opposed to regular team working practice.

[edit] 1.1.4 Future considerations

We are presently only utilising 30% of our available processing capacity and incurring large delays due to data transfer and manual interactions. To remove this expense in time and money, we need to optimise and automate as much of the system as possible. We have another, higher specification, XServe awaiting deployment which will boost our processing and encoding times; however we can’t add this to the grid until a suitable shared file store is implemented. This processing node will be capable of handling 8 simultaneous encoding processes working at or above real-time for DV inputs. I.e. it will need to be able to read 8 streams of data, the rate of which will depend on the input material. The table below outlines some of the rates that will need to be handled:


Format Resolution and Compression GB per hour MB per second
SD Pre-Compressed 720x576

H264 @ 1500

0.675 0.2
SD-PAL 720x576DV Stream 12 31
HD-720p 1280x720

HD Stream

250 69
HD-1080p 1920x1080HD Stream 535 149
Campaign Video (MPU output) 1024x576

Uncompressed

200 56

The SD-PAL is presently the most common form of video processed, and likely to remain that way for the foreseeable future. This implies that an 8 core processing system should need to be able to access 8 data streams at 31 MBps each = 248 MBps = 2Gbit/s.

The output files from the processing are much smaller, so the transfer requirements for these are substantially smaller also, typically in the range of the SD Pre-compressed and smaller.

The setup will utilise two XServes – a Podcast Producer (PcP) head-node (4 core system), and a second XServe (8 core system) acting as an XGrid Agent (i.e. 8 processing nodes). The system would scale by adding further processing nodes. Potentially using currently available equipment we could enable all 12 nodes, however for licensing reasons we are not expecting to do that initially.

In summary, this media data needs to be accessed and worked upon from a variety of machines ranging from editing stations to processing/encoding servers.

[edit] 1.2 Pilot Project Usage Levels

We are trying to pilot a streamlined process of podcasting with a limited number of content providers, in addition to supporting the growing number of university users external to our pilot projects. The pilots are with Medical Sciences, OUCS ITLP and the Language Centre, though we may also be including the Media Production Unit into this project. The “Randoms” category discussed below refers to university users who are using our service already but are not part of our pilot schemes.

We can roughly group our typical inputs into 4 categories: Standard Definition Video (SD); High Definition Video (HD), Lecture Capture (LT) and Audio only recordings (AU).


Format Input MB/h Outputs (MB/h) Totals Notes
Video Audio MB/h GB/h


High Med Low Archive Med Low Archive
SD 12000
680
350
160
12000
120
60
600
13970
14.0
SD PAL, DV Video
HD 60000
680
350
160
60000
120
60
600
61970
62.0
HD PAL, HD 720p Stream
LT 2000
680
350
160
2000
120
60
600
3970
4.0
HD XVGA, H264 Video
AU 600
0
0
0
0
120
60
600
780
0.8
PCM WAV 44.1khz Stereo

The above table illustrates the resultant data implications of each category. For video submissions, the original input is archived, and three video derivations are generated, along with audio only versions. For Audio only inputs, the outputs are two forms of compressed audio and the archive copy.

Based on these categories we anticipate our usage during the pilot phase of the podcasting service to resemble the figures below based on current experiences. The average length of each podcast is approximately one hour.


Scenarios      
Language Centre
     
Pilot Use Randoms Medsci OUCS Total Archive Outputs
SD
0.5
0
1
1
34925
31500
3425
HD
0.25
0
0
0
15492.5
15150
342.5
LT
0
4
2
1
27790
18200
9590
AU
4
0
0
1
3900
3000
900
Overall growth per week
MB/h
82107.5
67850
14257.5
GB/h
82.1
67.9
14.3

We are expecting 4 lecture recordings a week from Medical Sciences, 2 lectures and a video from OUCS/ITLP, a video, lecture and audio recording from the Language Centre, and based on experience so far, random submissions from our podcasting community in the region of 1 HD video per month, 2 SD videos per month, and 4 audio recordings a week.

This implies that we will be creating/gaining 82.1Gbs of data a week during service usage.

Under our devolved model it is very difficult to predict usage levels such as the above, due to the unknown activities of university contributors. However the trend observed is that generation of podcasting material is growing, and that that growth is accelerating. Due to the variety of capture methods available, the range of inputs in reality is a larger spectrum of values.